Your Final Report is essentially a thorough but concise writeup of your entire investigation, noting its successess, shortcomings and what you would suggest be pursued next.
The Final Report is meant to be a standalone document. You may have changed your investigative methods as you progressed through your analysis — that's expected and normal. The big thing to remember: everything a reader needs to know to understand the Report should be found in the report.
The Final Report accounts for 50% of your Final Project grade and should be a professional academic paper. You must use the required Overleaf template (see Requirements tab) — the template handles all formatting for you.
A concise standalone summary of the entire report that can be understood without reading the full paper. Your abstract should include:
Description of the reason you chose your analysis topic. In this section, you should provide some background research on the topic (this is the literature review), and a statement of your prior expectations for how your study would turn out.
This is a summary of the statistical analysis (the methods and the data used). Structure this section based on the questions of interest.
This is a statement of the subject matter implications of your study and discussion of further questions raised by your study.
This is an enumerated list, in APA-style, of the works you referenced throughout your Report.
Your data sources should be included in the References list as well.
This section contains paragraph-length summaries of each member's contributions to the project.
You must use the provided LaTeX template. Failure to use this template will result in a significant penalty.
Get the Template
The template features a professional two-column journal format with single-column abstract, clickable APA citations via biblatex-apa, and guidance for each section.
A "model" is a specific instance of an analysis — not just a model family. Your 3 models could be MLR + Poisson + Logistic, or three flavors of MLR with different predictors and/or outcomes. Either is fine. We expect 10+ predictors per model (a categorical variable counts as 1, regardless of its number of levels). The point is to see meaningful variable selection applied: start broad, then let the data guide you toward a parsimonious final model.
Each model must appear in a single, well-documented Jupyter notebook (neatly formatted, with markdown narration) that we will run, uploaded to your group repository. Each notebook should include all steps:
Every model must use a training, validation, and test split. Fit the model on the training data and use the validation set to guide model selection — variable selection, comparing candidate specifications, and (for logistic models) choosing a decision threshold. Hold the test set out until the end and report your final performance metrics on it.
This applies to all model families: report metrics appropriate to each (e.g., RMSE/MAE and R² for linear regression; AUC, log-loss, and threshold-based accuracy/precision/recall for logistic; mean Poisson deviance or RMSE on counts for Poisson).
This rubric shows how your Final Report will be graded. Each criterion has multiple scoring levels with descriptions of what constitutes each level of performance.