%matplotlib inline
import matplotlib.pyplot as plt
fig=plt.figure(figsize=(18, 13), dpi= 80, facecolor='w', edgecolor='k')
ax = plt.gca()
hist_dt[hist_dt["target_community"]==0].groupby([hist_dt.created_at.dt.day,hist_dt.created_at.dt.hour]).size().plot(ax=ax)
# if multiple samples: hist_dt[hist_dt["target_community"]==1].groupby([hist_dt.created_at.dt.day,hist_dt.created_at.dt.hour]).size().plot(ax=ax)
# if not the subsetting part could be removed
plt.xlabel('Dia,Hora de publicaciรณn')
plt.ylabel('Cantidad de tweets')
To select rows whose column value equals a scalar, some_value, use ==:
df.loc[df['column_name'] == some_value]
To select rows whose column value is in an iterable, some_values, use isin:
df.loc[df['column_name'].isin(some_values)]
To select rows whose column value is in another column array
df.apply(lambda x: x['Responsibility Type'] in x['Roles'], axis=1)
Combine multiple conditions with &:
df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]
Note the parentheses. Due to Python's operator precedence rules, & binds more tightly than <= and >=. Thus, the parentheses in the last example are necessary. Without the parentheses
df['column_name'] >= A & df['column_name'] <= B
is parsed as
df['column_name'] >= (A & df['column_name']) <= B
To select rows whose column value does not equal some_value, use !=:
df.loc[df['column_name'] != some_value]
To select rows whose value is not in some_values
df.loc[~df['column_name'].isin(some_values)]
Improve query efficiency by setting an index
df = df.set_index(['colname'])
Order columns lexicographically
df = df.reindex(sorted(df.columns), axis=1)
Remove duplicates (distinct)
df.drop_duplicates()# subset=["col1", "col2"] use only those columns for distinction