datahub icon indicating copy to clipboard operation
datahub copied to clipboard

Semester End Clean Up Tasks!

Open balajialg opened this issue 2 years ago • 4 comments

Summary

At the end of every semester, we need to perform the following housekeeping tasks. Collating them here so that we can prioritize these activities after the end of every semester.

  • [ ] Remove packages that did not get used (For Python packages - the Python popularity dashboard would serve as a valuable data point)
  • [ ] Remove auto-scaler calendar events that were added during the semester
  • [ ] Remove all the compute increase requests received during the semester
  • [ ] [Optional] Remove course admins for that specific semester
  • [ ] Run the archival process for all hub home directories
  • [ ] Reduce the number of nodes allocated for each node pool
  • [ ] Resolve dependabot alerts during the maintenance window
  • [ ] Blast email via datahub-announce email list with announcements and CTA to make requests such as package addition, RAM increase, calendar update, admin access requests etc..
  • [ ] Migrating Ubuntu #4395
  • [ ] Version unversioned packages #4167
  • [ ] Culling users from our config files related to memory allocation

Important information

Spring 23 semester ends May 12th!

Any other activity I am missing?

balajialg avatar Jan 12 '23 20:01 balajialg

R libraries may be tagged with comments mentioning the course and term for which they are requested. Should we remove them during maintenance windows after every term and require instructors to request them the next time they’re needed? This would help reduce the size of the image, but could lead to more CI builds at the beginning of the term if people don’t prep in advance. I’m in favor of removal but it should be discussed and perhaps raised with users (instructors).

We do need an R popularity dashboard.

ryanlovett avatar Jan 13 '23 08:01 ryanlovett

@ryanlovett I am thinking of adding a question "Whether the package requested has an end date for removal from the image" or something related to the template for package request. What do you think?

I am all in favor of building a R popularity dashboard as highlighted in this issue #2942. We should plan some dev cycles in the next few months if possible.

balajialg avatar Jan 17 '23 21:01 balajialg

@balajialg That is logical, but my guess is that instructors would want to specify no end date more often than not. Other infra devs may feel differently, but I think at least for the non-core courses, libraries should be opt-in every term. Smaller images means faster node start up which means faster scaling.

And yes, an R popularity dashboard is crucial. If we have that, we can feel better about removing libraries.

ryanlovett avatar Jan 18 '23 03:01 ryanlovett

@ryanlovett Sounds good. It will be a good idea to analyze the hubs we want to prune down the image (I am assuming the generic hubs). I will plan to blast out an email at the end of the semester highlighting the image pruning activity and request instructors to raise github issues for packages (Added this to the to-do list)

balajialg avatar Jan 23 '23 22:01 balajialg