Datasets
Primary Datasets
Drug Consumption
Real data on 1,876 participants’ demographics, personality traits, and substance use. Used in Walkthrough #1 to predict cocaine use (Coke, classification).
California Housing
Real data on 2,000 blocks in California from 1990 about houses, population, and location. Used in Walkthrough #2 to predict median house value (house_mdn_value, regression).
Additional Datasets
Airline Satisfaction
Real data on the satisfaction and experience of 10,000 customers of an airline. Can be used to predict satisfaction status (satifaction, classification).
Titanic Disaster
Real data on 1,309 passengers on the Titanic. Can be used to predict survival (survived, classification) or ticket price (fare, regression).
Water Potability
Real data on the potability and chemical properties of 2,011 water bodies. Can be used to predict safety to drink (Potability, classification).