Pandas

 

Creating DataFrames

Specify header while reading a tab-delimited file:

cols = ["col1", "col2", "col3" ]
df = pd.read_csv("data.tab", sep="\t", names=cols)

Rename several DataFrame columns:

df = df.rename(columns = {
    'col1 old name':'col1 new name',
    'col2 old name':'col2 new name',
    'col3 old name':'col3 new name',
})

Cleaning Data

Get a report of all duplicate records in a DataFrame, based on specific columns:

dupes = df[df.duplicated(['col1', 'col2', 'col3'], keep=False)]

Clean up missing values in multiple DataFrame columns:

df = df.fillna({
    'col1': 'missing',
    'col2': 999,
    'col3': 0,
    'col4': 'missing',
})

Exploring Data

Sort dataframe on a column:

df.sort_values(by='col1', ascending=False)

Grouping and Transforming Data

Group by two columns and count total in groups:

df.groupby(['col1', 'col2']).count()

Spreadsheet-style pivot with aggregation

table = pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'], aggfunc=np.sum)

Accessing an index from a multi-index DataFrame:

pandas.MultiIndex.get_level_values