Reviewer for data science fundamentals.
The Data Science Process
Data cleaning often takes the most time โ "garbage in, garbage out."
Statistics Basics
Descriptive stats (mean, median, mode, standard deviation) summarize data; inferential stats draw conclusions from samples.
Machine Learning
| Type | Goal |
|---|---|
| Supervised | Predict from labeled data |
| Unsupervised | Find patterns/clusters |
| Reinforcement | Learn by reward |
Correlation is not causation โ a model finding a relationship does not prove one variable causes another.
Overfitting
An overfitted model memorizes training data but fails on new data. Use test sets and cross-validation to detect it.
Before your exam, make sure you can confidently explain and apply each of the following:
- The Data Science Process
- Statistics Basics
- Machine Learning
- Overfitting
Re-read any section above where you hesitate, then explain it aloud in your own words โ if you can teach it simply, you understand it. Focus your final review on the tables, formulas, and the common-mistake warnings, since those are where most points are won or lost.