You're meant to turn in all of your work in a replicatable form. We need to be able to run your code and get your exact results.
If you used downloaded files and they're reasonably sized, upload them with your submission. If they're huge, upload a link to them along with clear instructions for downloading.
Your code submission is a critical component of your project and will be graded individually for each student. We need to be able to understand and reproduce your analysis.
Code to load, clean, and process your datasets. We should be able to run these steps and get your exact cleaned data. Include all data preprocessing, missing value handling, and feature engineering.
Team vs. Individual Work:
Initial data cleaning and processing can be shared team work (placed in repository root). If you work with additional datasets for your individual analysis, place that cleaning code in your personal folder.
Since you're combining multiple data sources, provide the code used to join/merge your datasets. This is crucial for understanding your data integration approach.
Team vs. Individual Work:
Basic data integration can be shared team work (placed in repository root). Individual analyses may require additional data merging specific to each approach.
All analytical code including model fitting, statistical tests, and validation. Use appropriate seeds so we can literally get your results! Include:
Individual Work Required:
Each student must perform their own complete end-to-end data analysis with their chosen modeling approach. This code must be in your personal folder.
Code used to generate all graphics, tables, and visualizations in your report. Your Final Report must meet all graphics requirements: exactly 5 graphics (combination of tables and data visualizations including charts and graphs), with at least 1-2 being impressive "show-stoppers" with excellent visual design, coloring, and labeling. Use seeds where appropriate to ensure reproducible results.
Team vs. Individual Work:
Graphics specific to individual analyses go in personal folders. Overarching graphics that don't stem from a single analysis (e.g., dataset overview charts) can be shared team work in the repository root.
Include documentation noting any team member who does not contribute at least 80% of what is expected for the code deliverable. This can be a simple text file or comment block in your main notebook.
Students must upload their individual code files to their personal folder within the team's GitHub repository that will be created for your group. Submit your code as a Jupyter Notebook (.ipynb file) that can be executed, plus a backup HTML version (in case your TA cannot run the notebook). Ensure all code cells are visible and properly commented in both versions.