This is a work in progress! Template: https://github.com/LDSSA/wiki/issues/330

Description

The goal of this issue is to assess interest and have a pre-allocation of batch7's teaching work and QA.

The units, overall work needed and release and delivery dates are listed below.

Units

Admissions

Unit
- Adjustment to the new Python and Pandas versions.
- Learning notebooks - minimum change: review
- Example notebooks - minimum change: review
- Exercise notebook - minimum change: review, new datasets
To be released on 23 October 2023
To be ready on: soon

SLU	Name	Last year instructor	Batch 7 instructor	Last year QA
SLU01	Pandas 101	@majkah0	@majkah0	@Jujulian3
SLU02	Subsetting Data in Pandas	@jgomes959	@jgomes959	@jgerebelo
SLU03	Visualization with Pandas & Matplotlib	@Gustavo-SF	@kagglekim	@SaraOGomes
Test		@fabiocruz		@danizao @majkah0 @minhhoang1023 @Gustavo-SF
Test on 23 October 2023

Batch 7 QA Lead SLU01/SLU02/SLU03: @FilipaPereira78
Batch 7 backup QA SLU01/SLU02/SLU03:@CaitlinHulse

*It will be the responsible for checking the SLUs ** It will be the responsible for checking the SLUs, in case there are major changes (exercises or materials updates) Both QA people do the test verification

Specialization 1 + Bootcamp

Project manager: José Rebelo @jgerebelo
Unit: minimum changes based on issues from last year and this year's QA. Adjustment to the new Python and Pandas versions. This year, we are doing a reverse process - first QA, then unit improvement.
- Learning notebook
- Example notebook
- Exercise notebook
To be released on:
- SLU04 - SLU10 learning notebooks: 19 November 2023
- SLU04 - SLU10 exercise notebooks: 26 November 2023
- SLU11- SLU19 learning and exercise notebooks: 26 November 2023
To be ready in November

SLU	Name	Last year instructor	Batch 7 instructor	Last year QA	Batch 7 QA
SLU04	Basic Stats with Pandas	@SaraOGomes	@cmm79	@jgomes959	@BG2602
SLU05	Covariance & Correlation	@kagglekim	@cmm79	@anaritarc	@BG2602
SLU06	Dealing with Data Problems	@majkah0	@TeignmouthElectron	@SaraOGomes	@BG2602
SLU07	Regression with Linear Regression	@jgerebelo	@joaogilsa	@carlacotas	@Mohamedgaber9
SLU08	Metrics for Regression	@marianahenriques1	@joaogilsa	@cd702	@Mohamedgaber9
SLU09	Classification with Logistic Regression	@majkah0	@majkah0	@carlacotas	@caitlinhulse
SLU10	Metrics for Classification	@phgui	@majkah0	@majkah0	@caitlinhulse
SLU11	Tree-Based Models	@anaritarc	@margaridantunes	@carlacotas	@Mohamedgaber9
SLU12	Feature Engineering (aka Real World Data)	@danizao	João Nobre	@anaritarc	@Mohamedgaber9
SLU13	Bias-Variance tradeoff & Model Selection	@jgerebelo	@rodrigomverissimo	@anaritarc	@BG2602
SLU14	Model complexity & Overfitting	@Gustavo-SF	@Gustavo-SF	@Jujulian3	@BG2602
SLU15	Hyperparameter Tuning	@jgomes959	@jgomes959	@SaraOGomes	@BG2602
SLU16	Workflow	@cimendes		@fabiocruz	@TeignmouthElectron
SLU17	Ethics & Fairness	@hershaw	@majkah0	@Gustavo-SF	@TeignmouthElectron
SLU18	Support Vector Machines (SVM) (optional unit)	@cimendes	@majkah0	@Jujulian3
SLU19	k-Nearest Neighbors (kNN) (optional unit)	@cimendes	@majkah0	@Jujulian3

Group	SLUs	Batch 7 QA lead*	Batch 7 backup QA**
QA1	SLU04, SLU05, SLU06	@BG2602	Caitlin Hulse
QA2	SLU07, SLU08	@Mohamedgaber9	Cora
QA3	SLU09, SLU10	Caitlin Hulse
QA4	SLU11, SLU12	@Mohamedgaber9
QA5	SLU13, SLU14, SLU15	@BG2602
QA6	SLU16, SLU17	@@TeignmouthElectron

*It will be the responsible for checking the SLUs ** It will be the responsible for checking the SLUs, in case there are major changes (exercises or materials updates)

Bootcamp presentations

Bootcamp presentations will be split in two parts. Presentations will be given by senior instructors. This is what is expected from each instructor:

The presentation should be <= 60 min including student questions. The presentation should be on concepts and insights for the given topic, not the technical implementation in Python.
If possible, read the relevant learning notebooks and give a high level feedback on them (e.g. topic X is not relevant, you should add topic Y)
If possible, answer questions from junior instructors who will update the corresponding SLUs, e.g. in a ~ 1 hour meeting where they will come with prepared questions. Time requirements: 1 hour for the talk + time to prepare the talk (we can provide slides from last year), <1 hour to read the learning notebooks, 1 hour for meeting with junior instructors.

Bootcamp part 1, Sunday 26. November 2023

Instructor 1: LTPlabs
- ~ 30 min: Intro to data science, SLU04 - Basic Stats with Pandas, SLU05 - Covariance and Correlation
- ~ 30 min: SLU06 - Dealing with Data Problems
Instructor 2: José Rebelo, EDP
- 30 - 60 min: SLU07 - Regression with Linear Regression, SLU08 - Metrics for Regression
Instructor 3: LTPlabs
- 30 - 60 min, SLU09 - Classification with Logistic Regression, SLU10 - Metrics for Classification

Bootcamp part 2, Sunday 3. December 2023

Instructor 4: João Ascensão, Stratio - TBC *45-60 min: SLU11 - Tree-Based Models, SLU12 - Feature Engineering
Instructor 5: Maria Cristina Dominguez
- ~60 min: SLU13 - Bias-Variance tradeoff & Model Selection, SLU14 - Model complexity and Overfitting, SLU15 - Hyperparameter Tuning
Instructor 6: Sam Hopkins, DareData
- 30-60 min: SLU16 - Workflow, SLU17 - Ethics and Fairness
Hackathon 1
- Come up with new problem for hackathon
- Create description, Find dataset and create data splits (should require some iteration and validating the most common approaches work well, i.e., they improve over a dummy baseline)
- Create baseline instructor solution
- Evaluation guidelines doc
- Overall guidelines for instructors to help out in hackathon
To be released on 17 December 2023
To be ready in November

Work unit	Name	Last year instructor	Batch 7 instructor	Last year QA	Batch 7 QA
Hackathon 1	Binary Classification	@wilsonramos1	@jgomes959	?	.