Students will be invited to given a login for this site after Final Project groups are finalized onJanuary 27.
Checkpoints, the Final Report and Peer Reviews will be submitted via this account.
Your individual code will be submitted via your group's GitHub repository, which we'll invite you to after your account is created.
Code Deliverable
Overview
You're meant to turn in all of your work in a replicatable form. We need to be able to run your code and get your exact results.
If you used downloaded files and they're reasonably sized, upload them with your submission. If they're huge, upload a link to them along with clear instructions for downloading.
Important: Code accounts for 40% of your Final Project grade
Your code submission is a critical component of your project and is graded individually for each student. Each team member must submit their own analysis code. We need to be able to understand and reproduce your analysis.
Deliverables
1. Data Loading and Cleaning Code
Code to load, clean, and process your datasets. We should be able to run these steps and get your exact cleaned data. Include all data preprocessing, missing value handling, and feature engineering.
Team vs. Individual Work
Initial data cleaning and processing can be shared team work (placed in repository root). If you work with additional datasets for your individual analysis, place that cleaning code in your personal folder.
2. Data Integration Code
Since you're combining multiple data sources, provide the code used to join/merge your datasets. This is crucial for understanding your data integration approach.
Team vs. Individual Work
Basic data integration can be shared team work (placed in repository root). Individual analyses may require additional data merging specific to each approach.
3. Analysis Code with Seeds
All analytical code including model fitting, statistical tests, and validation. Use appropriate seeds so we can literally get your results! Include:
Exploratory Data Analysis (EDA) code
Outlier screening and handling procedures
Variable selection procedures
Goodness of Fit testing and model assumption checks
Model training and evaluation code
Statistical analysis and hypothesis testing
Cross-validation and model selection procedures
Individual Work Required
Each student must perform their own complete end-to-end data analysis with their chosen modeling approach. This code must be in your personal folder.
4. Graphics Generation Code
Code used to generate all graphics, tables, and visualizations in your report. Your Final Report must meet all graphics requirements: exactly 5 graphics (combination of tables and data visualizations including charts and graphs), with at least 1-2 being impressive "show-stoppers" with excellent visual design, coloring, and labeling. Use seeds where appropriate to ensure reproducible results.
Team vs. Individual Work
Graphics specific to individual analyses go in personal folders. Overarching graphics that don't stem from a single analysis (e.g., dataset overview charts) can be shared team work in the repository root.
Format
Code must be runnable
Submit your code in a format that allows us to execute it. Include clear instructions for running your analysis, including any package dependencies.
Code readability is essential
Your code must be readable and well-organized. Use inline comments liberally to explain what each section does, why certain decisions were made, and any assumptions in your approach.
Submission Format
Students must upload their individual code files to their personal folder within the team's GitHub repository that will be created for your group. Submit your code as a Jupyter Notebook (.ipynb file) that can be executed, plus a backup HTML version (in case your TA cannot run the notebook). Ensure all code cells are visible and properly commented in both versions.