datahub icon indicating copy to clipboard operation
datahub copied to clipboard

Course specific profiles for non nbgitpuller courses!

Open balajialg opened this issue 2 years ago • 5 comments

Summary

We know that there are few courses like Stat 131A that use Datahub without using nbgitpuller links. Students log in directly to the hub to access their coursework. @ryanlovett had an interesting observation that we can explore creating login profiles for non-nbgitpuller courses. Just like how nbgitpuller provides course-specific context by retrieving content directly from a relevant GitHub repo, login profiles can provide course-specific context for non-nbgitpuller courses. We can also track how many users access a specific course. Here is a snapshot created by @rylo for how that would look like,

image

A good first step in this process is to store user's course enrollment data in our logs (from their bcourses data) so that we can find all courses using Datahub without nbgitpuller links

Extending course-specific use-case further, instructors can upload all the required course work to a shared read directory for that course (if interested). Students can log in to their per course profile and do varied operations like reading relevant materials from datasets stored in the shared read directory.

Important information

Adding @ryanlovett's context for further reference,

When people log in, we save the enrollments that bcourses knows about into the user's auth state. The course data looks like this: https://canvas.instructure.com/doc/api/courses.html. This auth state is then consulted when the hub spawns their server. Right now we use this to automatically give some people more resources when they're enrolled in a specific course.

One way of leveraging this, without putting a whole lot of work into it and without changing the login experience, is to just have the hub make an anonymous log entry for every course the student is enrolled in, e.g.

course enrollment: 12345, ANTHRO 128 course enrollment: 10101, SCANDIN 120 course enrollment: 23456, STAT 20 course enrollment: 32123, STAT 131A

Then you could extract this information from the logs. Of course, this would report on all the user's courses, and not just those the student is using the hub for, but it might yield interesting results anyways. The results would have to be expressed as enrollments per login, i.e. Stat 88 might appear in 1 of every 10 r.datahub logins while Rhetoric 1A might appear in 3 of every 100 r.datahub logins. It says more about the students using the hub than it does about the courses on the hub.

Otherwise, we'd need to use some sort of form to get the students to tell us what they're using the hub for. It'd only work if the student cared to make an accurate selection. It would not be accurate if the student planned on using r.datahub for, e.g. three different things. Like if they logged in to do some PH123 course work, then they pivoted to doing STAT88 stuff. So this form of data collection wouldn't be all that accurate. A login form would capture the first but not the second.

Tasks to complete

  • [ ] Store course enrollment data for students authenticating via bcourses
  • [ ] Create login profiles for courses that don't use nbgitpuller links

balajialg avatar Jun 10 '22 22:06 balajialg

@ryanlovett Please feel free to add more context to this GitHub issue (wherever necessary)!

balajialg avatar Jun 10 '22 22:06 balajialg

@balajialg I've prepared a code fragment that will cause the hub to emit the Canvas SIS course IDs:

[I 2022-06-13 22:58:47.360 JupyterHub <string>:128] SIS course IDs: ['CRS:STAT-W21-2018-C', 'None', 'PROJ:1234591bbef3b654', 'CRS:STAT-94-2015-D', '', 'CRS:STAT-20-2016-D-F48CB165', 'CRS:STAT-243-2021-D', 'PROJ:abcde3cc933791a2', 'CRS:STAT-21-2016-D', 'CRS:STAT-131A-2019-D', 'PROJ:a1b2c30b1e89e62f', 'CRS:STAT-W21-2014-C', 'PROJ:308e2c36ce4vwxyz']

This line would be visible in the logs explorer using something like:

labels."k8s-pod/release"="prob140-staging"
textPayload=~"SIS course IDs"
labels."k8s-pod/component"="hub"

The CRS entries are courses while the PRJ entries are non-instructional bCourse projects. Most students would probably only have CRS entries, and only for the courses they're currently enrolled in. My courses are those I've been added as an instructor to. Perhaps active instructors would have a lot of courses when factoring in prior terms.

Is this sufficient for you to mine course affiliations? As mentioned above, "this would report on all the user's courses, and not just those the student is using the hub for, but it might yield interesting results anyways." If you'd like, I can merge this and you can try it out for summer. It will work for all hubs using the Canvas authenticator which are currently datahub, data100, data102, dlab, eecs, ischool, julia, prob140, and r.

ryanlovett avatar Jun 13 '22 23:06 ryanlovett

@ryanlovett This looks like a great idea! I will definitely play around with the log data during the summer term. Based on this exploration, we can decide whether to continue or revert this for the fall term. Can you please merge this PR when you have time?

balajialg avatar Jun 14 '22 00:06 balajialg

Would these have different environments?

yuvipanda avatar Jun 14 '22 20:06 yuvipanda

@balajialg Apparently KubeSpawner has supported profiles natively for a while now. We can experiment with this on a separate hub.

ryanlovett avatar Dec 21 '22 06:12 ryanlovett