A comprehensive exploration of the Airbnb dataset, focusing on dataset overview, data types, missing values, zero values, and value distributions.
1. Dataset Overview: A snapshot of the Airbnb dataset.
Learn More:2. Data Types Distribution: Analyzing the types of data in the dataset.
Learn More:3. Columns with Missing Values: Identifying columns with missing data.
Learn More:4. Columns with Zero Values: Identifying columns with zero data.
Learn More:5. Distribution of `room_type`: Analyzing the types of rooms available.
Learn More:6. Distribution of `burough`: Analyzing the distribution of boroughs in the dataset.
Learn More:Overview: Provides a general overview of the dataset.
Explanation: This function gives a snapshot of the dataset, including the number of rows, columns, missing values, and more.
pp.overview(airbnb)
Description | Count |
---|---|
Total Rows | 10,019 |
Total Columns | 22 |
Columns with Missing Values | 4 |
Total Duplicate Rows | 13 |
Most Frequent Data Type | object |
Columns with Binary Values | 0 |
Columns with Zero Values | 10 |
Unique Data Types | 3 |
Numeric Columns | 13 |
Non-Numeric Columns | 9 |
The table above provides a comprehensive overview of the Airbnb dataset, highlighting key metrics and characteristics.
Overview: Analyzes the distribution of data types in the dataset.
Explanation: This function provides a breakdown of the different data types present in the dataset and their distribution.
pp.datatypes(airbnb)
Data Type | Column Count | % Distribution |
---|---|---|
object | 8 | 50.0 |
int64 | 4 | 25.0 |
float64 | 4 | 25.0 |
The table above showcases the distribution of data types in the Airbnb dataset.
Overview: Identifies columns with missing values in the dataset.
Explanation: This function lists columns that have missing values, along with the count and percentage of missing data.
pp.missing(airbnb)
Column Name | Missing Count | Missing % |
---|---|---|
price | 238 | 2.0 |
estimated_revenue | 238 | 2.0 |
name | 5 | 0.0 |
host_name | 2 | 0.0 |
The table above highlights columns with missing values in the Airbnb dataset.
Overview: Identifies columns with zero values in the dataset.
Explanation: This function lists columns that have zero values, along with the count and percentage of zeros.
pp.zeros(airbnb)
Column Name | Zero Count | Zero % |
---|---|---|
availability_365 | 3614 | 36.07 |
number_of_reviews | 2075 | 20.71 |
last_review | 2075 | 20.71 |
reviews_per_month | 2075 | 20.71 |
rating | 2075 | 20.71 |
number_of_stays | 2075 | 20.71 |
5_stars | 2075 | 20.71 |
occupancy | 290 | 2.89 |
estimated_revenue | 288 | 2.87 |
price | 2 | 0.02 |
The table above highlights columns with zero values in the Airbnb dataset.
Overview: Analyzes the distribution of values for the `room_type` column.
Explanation: This function provides a breakdown of the distribution of room types in the dataset.
pp.distribution(airbnb, 'room_type')
room_type | Count | % |
---|---|---|
Entire Home/Apt | 5186 | 51.76 |
Private Room | 4607 | 45.98 |
Shared Room | 226 | 2.26 |
The table above showcases the distribution of room types in the Airbnb dataset.
Overview: Analyzes the distribution of values for the `burough` column.
Explanation: This function provides a breakdown of the distribution of boroughs in the dataset.
pp.distribution(airbnb, 'burough')
burough | Count | % |
---|---|---|
Manhattan | 4449 | 44.41 |
Brooklyn | 4086 | 40.78 |
Queens | 1182 | 11.80 |
Bronx | 229 | 2.29 |
Staten Island | 73 | 0.73 |
The table above showcases the distribution of boroughs in the Airbnb dataset.