[BACKLOG] Automate GenEd courses classification
Problem
Currently, courses are tagged as GenEd using override objects in MongoDB which maps (courseNo, studyProgram, semester, academicYear) to (genEdType, sections). However, these override objects are manually created by our maintainers and uploaded to MongoDB.
The current process of GenEd classification is
- Go to https://cas.reg.chula.ac.th/cu/cs/QueryCourseScheduleNew/index.html
- Inspect element and remove
type="HIDDEN"attribute fromgenedcodefield to enable it
- Query a list of GenEd courses using genedcodes (1 = SO, 2 = HU, 3 = SC, 4 = IN)
- Go to the courses one-by-one and manually evaluate if this course is REALLY GenEd (the ones most students must enroll).
- Create a csv of all manually-verified GenEd courses, and upload them to MongoDB (via cugetreg-api) as
overrideobjects
Obviously, this process is pain-staking and prone to human errors.
Task Description
Develop a way to automatically tag courses as GenEd (or at least, make it least manually as possible), to make it easier to maintain for future generations. One current idea is to infer from the section's notes to determine if it's either:
- definitely GenEd
- is definitely NOT GenEd
- not sure, needs human verification
The solution remains to be discussed further.
Additional Context
Currently, we have course data from https://cas.reg.chula.ac.th. We could try to obtain data from other sources. Consult @bombnp if you want to request access to data we don't currently have.
Related Teams
- [ ] Frontend
- [x] Backend
- [X] Data
- [ ] Design
- [ ] Infra
- [ ] QA
Task Advisors
@bombnp
After doing basic data exploratory analysis with @panus2001, we decided it's better to consult professors on how to accurately determine GenEd courses, since there must be a way to validate this when we graduate. Will ask Proadpran to start.