![]() ![]() corr_plot ( df, target = 'wine' ) # default representation of correlations with the feature column corr_plot ( df, split = 'neg' ) # displaying only negative correlations corr_plot ( df, split = 'pos' ) # displaying only positive correlations, other settings include threshold, cmap. missingval_plot ( df ) # default representation of missing values in a DataFrame, plenty of settings are available loss of information Examplesįind all available examples as well as applications of the functions in klib.clean() with detailed descriptions here. pool_duplicate_subsets ( df ) # pools subset of cols based on duplicates with min. mv_col_handling ( df ) # drops features with high ratio of missing vals based on informational content - klib. drop_missing ( df ) # drops missing values, also called in data_cleaning() - klib. convert_datatypes ( df ) # converts existing to more efficient dtypes, also called inside data_cleaning() - klib. clean_column_names ( df ) # cleans and standardizes column names, also called inside data_cleaning() - klib. data_cleaning ( df ) # performs datacleaning (drop duplicates & empty rows/cols, adjust dtypes.) - klib. missingval_plot ( df ) # returns a figure containing information about missing values # klib.clean - functions for cleaning datasets - klib. dist_plot ( df ) # returns a distribution plot for every numeric feature - klib. corr_plot ( df ) # returns a color-encoded heatmap, ideal for correlations - klib. corr_mat ( df ) # returns a color-encoded correlation matrix - klib. cat_plot ( df ) # returns a visualization of the number and frequency of categorical features - klib. DataFrame ( data ) # scribe - functions for visualizing datasets - klib. Usage import klib import pandas as pd df = pd. Use the package manager pip to install klib.Īlternatively, to install this package with conda run: Additionally, there are great introductions and overviews of the functionality on PythonBytes or on YouTube (Data Professor). Explanations on key functionalities can be found on Medium / TowardsDataScience and in the examples section. For major changes or feedback, please open an issue first to discuss what you would like to change.Klib is a Python library for importing, cleaning, analyzing and preprocessing data. Pull requests and ideas, especially for further functions are welcome. Klib.cat_plot(data, top= 4, bottom= 4) # representation of the 4 most & least common values in each categorical columnįurther examples, as well as applications of the functions in klib.clean() can be found here. Klib.dist_plot(df) # default representation of a distribution plot, other settings include fill_range, histogram. rr_plot(df, target= 'wine') # default representation of correlations with the feature column rr_plot(df, split= 'neg') # displaying only negative correlations rr_plot(df, split= 'pos') # displaying only positive correlations, other settings include threshold, cmap. klib.missingval_plot(df) # default representation of missing values in a DataFrame, plenty of settings are available ![]() klib.pool_duplicate_subsets(df) # pools subset of cols based on duplicates with min. ![]() klib.mv_col_handling(df) # drops features with high ratio of missing vals based on informational content klib.drop_missing(df) # drops missing values, also called in data_cleaning() nvert_datatypes(df) # converts existing to more efficient dtypes, also called inside data_cleaning() klib.clean_column_names(df) # cleans and standardizes column names, also called inside data_cleaning() klib.data_cleaning(df) # performs datacleaning (drop duplicates & empty rows/cols, adjust dtypes.) ![]() klib.missingval_plot(df) # returns a figure containing information about missing values # klib.clean - functions for cleaning datasets klib.dist_plot(df) # returns a distribution plot for every numeric feature rr_plot(df) # returns a color-encoded heatmap, ideal for correlations rr_mat(df) # returns a color-encoded correlation matrix klib.cat_plot(df) # returns a visualization of the number and frequency of categorical features # scribe - functions for visualizing datasets Klib is a Python library for importing, cleaning, analyzing and preprocessing data. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |