A comprehensive guide to understanding and utilizing the core functionalities of the PivotPal Python package.
1. Value Distribution Table: Dive deep into the distribution of values for a specified column.
Learn More:2. Unique Values in Dataset: Uncover the unique values present in each column.
Learn More:3. Missing Values Analysis: Identify and understand the gaps in your dataset.
Learn More:4. Duplicate Row Analysis: Detect and quantify duplicate rows in your dataset.
Learn More:Overview: Analyzes the distribution of values for a specified column in a dataset.
Explanation: This function provides a breakdown of the distribution of values for a specified column. It's useful for understanding the spread and frequency of data points within a column.
pp.distribution(your_data, 'column_name')
Column Name | Count | % |
---|---|---|
Value1 | Count1 | % |
Value2 | Count2 | % |
Value3 | Count3 | % |
The table above showcases the distribution of values for the specified column. It provides a count and percentage distribution for each unique value.
Overview: Provides a count of unique values for each column in the dataset.
Explanation: This function enumerates the unique values present in each column of the dataset, helping to understand data diversity.
pp.unique(your_data)
Column Name | Unique Count |
---|---|
Column1 | Count1 |
Column2 | Count2 |
Column3 | Count3 |
The table above lists the unique value counts for each column in the dataset. This helps in understanding the diversity and spread of data within columns.
Overview: Provides a summary of missing values for each column in the dataset.
Explanation: This function identifies columns with missing values, providing a count and percentage of missing data. It's crucial for data cleaning and preprocessing.
pp.missing(your_data)
Column Name | Missing Count | Missing % |
---|---|---|
Column1 | Count1 | % |
Column2 | Count2 | % |
Column3 | Count3 | % |
The table above highlights columns with missing values. It provides a count and percentage of missing data for each column, aiding in data quality assessment.
Overview: Identifies and counts duplicate rows in the dataset.
Explanation: Duplicate rows can skew analysis and lead to incorrect conclusions. This function helps in identifying and potentially removing them.
pp.duplicates(your_data)
Row Index | Duplicate Count |
---|---|
Index1 | Count1 |
Index2 | Count2 |
Index3 | Count3 |
The table above lists rows that are duplicated in the dataset along with their count.