Pandas

Frequently used code for pandas related code snippets

plot columns

df.plot(figsize=(20,10))

fill na

dt.fillna(0)

Reading data from Postgresql

import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("postgresql://user@localhost:5432/mydb")
df = pd.read_sql_query("select * from table", con=engine)

Add rows

dt.append({"col1":[1,2,3], "coln":[4,5,6]})

pivoting dataframe

df.pivot(columns="col_with_columns",values="col_with_values")

getting index of max/min values

df['column'].idxmax()

Count instances of values in column

df['target'].value_counts().sort_values(ascending=False)

Filtering by column's value len

df[df['itemsets'].map(len) > 2]

Iterate by rows

Subsetting columns

or by regex pattern

Parse datetimes

Group by dates

Plot evolution over time

New column A based on column B

To select rows whose column value equals a scalar, some_value, use ==:

To select rows whose column value is in an iterable, some_values, use isin:

To select rows whose column value is in another column array

Combine multiple conditions with &:

Note the parentheses. Due to Python's operator precedence rules, & binds more tightly than <= and >=. Thus, the parentheses in the last example are necessary. Without the parentheses

is parsed as

To select rows whose column value does not equal some_value, use !=:

To select rows whose value is not in some_values

Improve query efficiency by setting an index

Order columns lexicographically

Remove duplicates (distinct)

Check if a field exists

Replacing column values

One-hot encoding

Rename column

Rename index names

Remove column from list

Join dataframes

Iterate by groups

Group data by year

Count unique by group

Collect list in Group by

Get normalized values from groupby count

Generate csv from dataframe

Sum to date column

Replace character in column

References

Last updated

Was this helpful?