%matplotlib inline import matplotlib.pyplot as pltfig=plt.figure(figsize=(18, 13), dpi=80, facecolor='w', edgecolor='k')ax = plt.gca()hist_dt[hist_dt["target_community"]==0].groupby([hist_dt.created_at.dt.day,hist_dt.created_at.dt.hour]).size().plot(ax=ax)
# if multiple samples: hist_dt[hist_dt["target_community"]==1].groupby([hist_dt.created_at.dt.day,hist_dt.created_at.dt.hour]).size().plot(ax=ax)
# if not the subsetting part could be removedplt.xlabel('Dia,Hora de publicación')plt.ylabel('Cantidad de tweets')
To select rows whose column value equals a scalar, some_value, use ==:
df.loc[df['column_name']== some_value]
To select rows whose column value is in an iterable, some_values, use isin:
df.loc[df['column_name'].isin(some_values)]
To select rows whose column value is in another column array
df.apply(lambdax: x['Responsibility Type'] in x['Roles'], axis=1)
Combine multiple conditions with &:
df.loc[(df['column_name']>= A) & (df['column_name']<= B)]
Note the parentheses. Due to Python's operator precedence rules, & binds more tightly than <= and >=. Thus, the parentheses in the last example are necessary. Without the parentheses
df['column_name']>= A & df['column_name']<= B
is parsed as
df['column_name']>= (A & df['column_name']) <= B
To select rows whose column value does not equal some_value, use !=:
df.loc[df['column_name']!= some_value]
To select rows whose value is not in some_values
df.loc[~df['column_name'].isin(some_values)]
Improve query efficiency by setting an index
df = df.set_index(['colname'])
Order columns lexicographically
df = df.reindex(sorted(df.columns), axis=1)
Remove duplicates (distinct)
df.drop_duplicates()# subset=["col1", "col2"] use only those columns for distinction