C:\Users\AppData\Local\Temp\ipykernel_4420\1678458970.py:2: FutureWarning: The squeeze argument has been deprecated and will be removed in a future version. Append .squeeze("columns") to the call to squeeze.
# 通过变量筛选,使用@表示变量
name = 'b'
df1.query('name == @name')
id
name
1
2
b
# 多条件筛选
df2.query('score <88 and id <5') # 与:and或&,或:or或|
df2.loc[(df['score'] <88) &(df['id'] <5)]
---------------------------------------------------------------------------KeyError Traceback (most recent call last)D:\RpaStudy\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)3079 try:
-> 3080 return self._engine.get_loc(casted_key)3081 except KeyError as err:pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()KeyError: 'score'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)~\AppData\Local\Temp\ipykernel_14984\3182303803.py in 1 # 多条件筛选2 df2.query('score <88 and id <5') # 与:and或&,或:or或|
----> 3 df2.loc[(df['score'] <88) &(df['id'] <5)]D:\RpaStudy\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)3022 if self.columns.nlevels > 1:3023 return self._getitem_multilevel(key)
-> 3024 indexer = self.columns.get_loc(key)3025 if is_integer(indexer):3026 indexer = [indexer]D:\RpaStudy\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)3080 return self._engine.get_loc(casted_key)3081 except KeyError as err:
-> 3082 raise KeyError(key) from err3083 3084 if tolerance is not None:KeyError: 'score'
Traceback (most recent call last):File "D:\RpaStudy\lib\site-packages\IPython\core\interactiveshell.py", line 3553, in run_codeexec(code_obj, self.user_global_ns, self.user_ns)File "C:\Users\AppData\Local\Temp\ipykernel_14984\960150518.py", line 1, in df3.query('a b >= 5 ')File "D:\RpaStudy\lib\site-packages\pandas\core\frame.py", line 3466, in queryres = self.eval(expr, **kwargs)File "D:\RpaStudy\lib\site-packages\pandas\core\frame.py", line 3596, in evalreturn _eval(expr, inplace=inplace, **kwargs)File "D:\RpaStudy\lib\site-packages\pandas\core\computation\eval.py", line 342, in evalparsed_expr = Expr(expr, engine=engine, parser=parser, env=env)File "D:\RpaStudy\lib\site-packages\pandas\core\computation\expr.py", line 798, in __init__self.terms = self.parse()File "D:\RpaStudy\lib\site-packages\pandas\core\computation\expr.py", line 817, in parsereturn self._visitor.visit(self.expr)File "D:\RpaStudy\lib\site-packages\pandas\core\computation\expr.py", line 397, in visitraise eFile "D:\RpaStudy\lib\site-packages\pandas\core\computation\expr.py", line 393, in visitnode = ast.fix_missing_locations(ast.parse(clean))File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.2544.0_x64__qbz5n2kfra8p0\lib\ast.py", line 35, in parsereturn compile(source, filename, mode, PyCF_ONLY_AST)File "", line 1a b >=5^
SyntaxError: invalid syntax
df3.query('`a b` >= 5 ')
a
a b
1
2
5
2
3
6
filter
过滤标签
可以调用 filter() 的对象还有以下对象有:
pandas.DataFrame.filter
pandas.Series.filter
pandas.core.groupby.DataFrameGroupBy.filter
参数:
items:list-like,对应轴的标签名列表
like:str,支持对应标签名的模糊名查询
regex:str (正则表达式),按正则表达式查询标签名
axis:{0 or ‘index’, 1 or ‘columns’, None}, default None,要筛选的轴,表示为索引(int)或轴名称(str)。默认情况下为信息轴(info axis),Series为‘index’, DataFrame为‘columns’
# 筛选出账户状态非结清,每个客户号只出现一次的行,进行升序,取前十条
df.query('账户状态 != "结清"' ).filter(items=['客户号','发放金额']).groupby('客户号').filter(lambda x:x['客户号'].size ==1).sort_values('发放金额',ascending=True)[2:5]# select 客户号,发放金额 from df where 账户状态 <> 结清 group by 客户号 having count(客户号)==1 order by 发放金额 limit 10
客户号
发放金额
18448
P030000197
20000.0
10729
P01000033252
20000.0
7104
P03000117634
20000.0
# select 客户号,sum(发放金额) as 发放金额 from df where 账户状态 != 结清 group by 客户号 having count(客户号)>1
df1 = df.query('账户状态 != "结清"').groupby('客户号', as_index=False).filter(lambda x: x['客户号'].size > 1)
df1.groupby('客户号',as_index=False).agg({'发放金额':'sum'})