Datasets

Primary Datasets

Drug Consumption

Real data on 1,876 participants’ demographics, personality traits, and substance use. Used in Walkthrough #1 to predict cocaine use (Coke, classification).

California Housing

Real data on 2,000 blocks in California from 1990 about houses, population, and location. Used in Walkthrough #2 to predict median house value (house_mdn_value, regression).

Additional Datasets

Airline Satisfaction

Real data on the satisfaction and experience of 10,000 customers of an airline. Can be used to predict satisfaction status (satifaction, classification).

Titanic Disaster

Real data on 1,309 passengers on the Titanic. Can be used to predict survival (survived, classification) or ticket price (fare, regression).

Water Potability

Real data on the potability and chemical properties of 2,011 water bodies. Can be used to predict safety to drink (Potability, classification).