ISYE 6414
User accounts will be set up and emailed byJune 1.
Your Analysis Plan, Final Report, and Peer Reviews will be submitted via this account.
Your individual code will be submitted via your group's GitHub repository, which we'll invite you to after your account is created.

Code

Your group's code lives in your group GitHub repository. There is no separate code submission and no individual code grade — we download the repository, run it, and confirm it reproduces the work described in your Final Report.

How code is graded
We check that your code matches the work described in your Final Report and that all of the expected steps were performed. No individual grade is associated with it. If the code is severely lacking, however, we will apply a deduction to your group's Final Report grade (see the Required Models and Predictors-per-Model penalties on the Final Report Guide).

What to upload

Your repository should clearly show:

  1. Data sources being joined. The code that merges/joins your datasets from their different sources, so we can see how your combined dataset was assembled.
  2. Any cleaning. All preprocessing — missing-value handling, type conversions, feature engineering, etc. — so we can reproduce your exact analysis-ready data.
  3. A notebook for each model. One single, well-documented Jupyter notebook (.ipynb) per model (3 models), neatly formatted with markdown narration. We will run these.

What each model's notebook must include

Each model's notebook should walk through the full analysis, with all of these steps:

  • Loading, cleaning, and merging the data it uses
  • Exploratory Data Analysis (EDA)
  • Outlier screening and handling
  • Splitting the data into training, validation, and test sets
  • Variable selection (using the training and validation data)
  • Goodness-of-fit testing and model assumption checks
  • Model training, then final evaluation on the held-out test set
  • Statistical analysis and hypothesis testing
Models & predictors

Your group must analyze the data using 3 models — each a specific instance of an analysis (e.g., MLR + Poisson + Logistic, or three flavors of MLR with different predictors/outcomes), not just a model family.

We expect 10+ predictors per model (a categorical variable counts as 1 predictor regardless of its number of levels). The point is to see meaningful variable selection applied: start broad, then let the data guide you toward a parsimonious final model.

Train / validation / test split

Every model must use a training, validation, and test split. Fit on the training data, use the validation set for model selection (variable selection, comparing specifications, and threshold choice for logistic models), and report final metrics on the held-out test set. This holds for MLR, Poisson, and Logistic alike — only the metrics you report differ by family.

Make it runnable
Use seeds where appropriate so we can reproduce your results, and note any package dependencies. We should be able to open each notebook, run it top to bottom, and arrive at the results you report.