![]() pool_duplicate_subsets ( df ) # pools subset of cols based on duplicates with min. mv_col_handling ( df ) # drops features with high ratio of missing vals based on informational content - klib. ![]() drop_missing ( df ) # drops missing values, also called in data_cleaning() - klib. convert_datatypes ( df ) # converts existing to more efficient dtypes, also called inside data_cleaning() - klib. clean_column_names ( df ) # cleans and standardizes column names, also called inside data_cleaning() - klib. ![]() data_cleaning ( df ) # performs datacleaning (drop duplicates & empty rows/cols, adjust dtypes.) - klib. missingval_plot ( df ) # returns a figure containing information about missing values # klib.clean - functions for cleaning datasets - klib. dist_plot ( df ) # returns a distribution plot for every numeric feature - klib. corr_plot ( df ) # returns a color-encoded heatmap, ideal for correlations - klib. corr_mat ( df ) # returns a color-encoded correlation matrix - klib. cat_plot ( df ) # returns a visualization of the number and frequency of categorical features - klib. DataFrame ( data ) # scribe - functions for visualizing datasets - klib. ![]() Usage import klib import pandas as pd df = pd. Use the package manager pip to install klib.Īlternatively, to install this package with conda run: Additionally, there are great introductions and overviews of the functionality on PythonBytes or on YouTube (Data Professor). Explanations on key functionalities can be found on Medium / TowardsDataScience and in the examples section. Klib is a Python library for importing, cleaning, analyzing and preprocessing data. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |