batchspawner icon indicating copy to clipboard operation
batchspawner copied to clipboard

Batchspawner maintenance?

Open rkdarst opened this issue 5 years ago • 26 comments

Hi,

I'm wondering if we could discuss how to make sure that batchspawner is maintained in the future. Currently, there are lots of PRs open that need review, and it seems that development waits until I make a personal integration branch and that gets merged at once. Without an updated master, it's hard to have a dev target.

We've got different people making pull requests. If we could have someone from the JupyterHub organization watching and reviewing, we could probably make better progress. They don't have to be experts in batch systems themselves (I and others who are commenting here can do that, plus no one knows all the batch systems), but someone with good knowledge of JH would be able to give us the most important feedback, merge, and keep things going.

For my part, I've spent enough time that I can give a good first review, but things that involve JupyterHub internals definitely need more eyes on it. My next development push will probably be within a few weeks, so having some way to progress then would be good.

Does anyone (especially from jupyterhub) have any comments?

rkdarst avatar Oct 08 '18 12:10 rkdarst

First off I want to thank @mbmilligan, @rkdarst and others for recent efforts here. They are much appreciated by many. I'd like to also make a few suggestions.

  • A high level comment is that keeping this really useful project going will take a community with needs that sometimes overlap, and other times don't overlap. More of us need to be reading, commenting, approving, etc.

  • Let's encourage batchspawner users/deployers to list themselves as such in the README. For example, a section of the README should be "Places where batchspawner is in use" and then HPC centers and clusters could list themselves, optionally with contact information. This might make it easier for us to reach out to one another with questions and establish best practices. It also gives us a way to direct questions/issues/PRs at people in the best position to respond.

  • It would also be great if people could list whether they use one of the BatchSpawner implementations and which. If they don't, have they written a custom implementation and include a link to it on GitHub. This could serve as useful reference for other users. Also same logic as above about where to direct questions/issues/PRs.

  • I'm not of the opinion that every center deploying batchspawner should contribute their own customizations of various BatchSpawners back to batchspawner here. Some things only make sense where you are! For instance we use our own SlurmSpawner because we don't use -p at all, have custom Slurm options (--image for Shifter image, support for SDN, etc). We should think about whether the current implementations provided with the package are converged and useful reference implementations, but maybe encourage sites to use the lower level interfaces when it's more appropriate.

The last point there might help distribute the work a bit better. For a while now I've been wondering whether the base BatchSpawner class needs a refactor, it has quite so many options it's become a repository of all the different ways a batch spawner could work and that doesn't seem like ideal use of inheritance. There's all kinds of doodads hanging off every spawner because of it. Sure, they don't get used, but the model has become "attach what you need in the base class." It's often the easiest way to go!

rcthomas avatar Oct 08 '18 15:10 rcthomas

+1 to everything. I had tried part of this in SPAWNERS.md and will continue trying to

  • I'm not of the opinion that every center deploying batchspawner should contribute their own customizations of various BatchSpawners back to batchspawner here. Some things only make sense where you are! For instance we use our own SlurmSpawner because we don't use -p at all, have custom Slurm options (--image for Shifter image, support for SDN, etc). We should think about whether the current implementations provided with the package are converged and useful reference implementations, but maybe encourage sites to use the lower level interfaces when it's more appropriate.

The last point there might help distribute the work a bit better. For a while now I've been wondering whether the base BatchSpawner class needs a refactor, it has quite so many options it's become a repository of all the different ways a batch spawner could work and that doesn't seem like ideal use of inheritance. There's all kinds of doodads hanging off every spawner because of it. Sure, they don't get used, but the model has become "attach what you need in the base class." It's often the easiest way to go!

I thought this comment by @minrk was quite insightful and matches our case: https://github.com/jupyterhub/kubespawner/issues/248#issue-351849695 . We don't need to re-implement all batch system options, because req_options can set most everything. Most other options should be optional of some form (like --partition already is in fact). With jinja2 tempalting, we can reduce the need for divergence and also make somewhat reasonable default, I hope.

Thanks for the feedback!

rkdarst avatar Oct 08 '18 18:10 rkdarst

Agreed, I do think batchspawner is suffering from the people with necessary expertise (myself included!) not having the time to properly maintain it. I think it would be worthwhile to have a discussion here about how to address that. I think it mainly comes down to having more than one maintainer suitably empowered to merge branches and ultimately cut a release -- and getting there requires both people, and putting some workflow in place so we don't step on each others' toes.

In parallel with that discussion, I would like to propose taking the next week or two to really push on these remaining PRs and see if we can get the next release done.

mbmilligan avatar Oct 10 '18 21:10 mbmilligan

I can also give a bit of help

zonca avatar Nov 08 '18 20:11 zonca

We need people who can debug and test, people who can make changes, and people who can approve and provide advice. The last category can also come from JupyterHub, the first we have enough on the issue tracker. And the middle I think is OK too.

So, what should we do... here is my proposal:

  • Get more people who can accept pull requests, even if they aren't fully able to determine if they are perfect. They can at least serve as a second set of eyes and make sure tests are reasonable. (in other words, if it takes a long time for fixes to go in, it's better to take possibly not working fixes than nothing, because at least it can be iterated).
  • Drop Python 3.4 support and maybe JH 0.7 support. It would be good to support them, but with limited resources we probably need to look to the future more, especially if people familiar with older versions aren't around so much.
  • Adopt a slightly more aggressive development cycle. It would be better to be conservative, but a faster cycle allows new contributors to get battle tested sooner, which will help us stabilize in the medium term.

What do you think? Most importantly, who can we get to review pull requests more quickly? That is needed, even in the current situation.

rkdarst avatar Feb 04 '19 14:02 rkdarst

The number of issues, old, and duplicate PRs continues to grow. The number of issues we get shows that people are continuing to use batchspawner at a regular pace, and not keeping it updated isn't doing the community any favors.

At this point I think faster development is better, even at the cost of more short-term bugs. We are a bit unique in that everyone's clusters are a bit unique (hard to actually test bugs), but also the users tend to be skilled and can give good help back. Monthly video meetings would be nice but not sufficient unless there is also active discussion.

I can volunteer to be an alternative PR acceptor (but in that case we would need other reviewers too). I can't promise to do everything perfectly, but can keep things moving and fix bugs when they get through... which is the important thing for now.

Does anyone have a fork which they would like to keep more up to date, as a parallel master merging changes sooner? I can make such a fork in NordicHPC or AaltoScienceIT which we can use if needed.

rkdarst avatar Jul 08 '19 01:07 rkdarst

@mbmilligan @rcthomas can you please give @rkdarst powers to help maintain this repo (I can't)? I don't have any bandwidth right now to help triage issues.

zonca avatar Jul 08 '19 12:07 zonca

I'm here in Oslo at the JupyterHub sprint, where batchspawner has been a minor part. @willingc has merged several issues and given me merge access here. But that doesn't solve the bigger issue: who's still interested in setting the long-term direction, reviewing things, and providing advice? I'm not sure if the plan from June worked, but I guess there is still time. @mbmilligan, will you still review and accept things? Do you have time to do this?

I guess it's especially hard here, since people who set up batchspawner usually do it, and the cluster lasts many years without many changes. Once you work around the problems yourself, there's not much motivation to do more.

rkdarst avatar Sep 06 '19 16:09 rkdarst

Great to see all the activity happening! At the June workshop some of us @mbmilligan @cmd-ntrf and I thought a monthly batchspawner-and-friends videoconference would be a good way to coordinate activity and keep momentum, but we were going to let it ride till at least August. Maybe now's the time to pick that up?

rcthomas avatar Sep 06 '19 16:09 rcthomas

If that can work, great. I don't think that serves the same purpose as reviewing issues and requests or will make things go as fast as it needs, but at least it's something. Let me know when it happens. Are there any other ideas?

rkdarst avatar Sep 06 '19 16:09 rkdarst

For my part, yes, I am definitely interested in continuing to review and accept things, and provide high-level direction. And now that it's September, I would like to see a regular meet-up of some kind get off the ground. I do think that would help move things forward with Batchspawner too -- at least on my end, having many projects demanding attention, it would be useful to have a venue to periodically re-sync on what the current state and needs of the project are.

mbmilligan avatar Sep 06 '19 22:09 mbmilligan

It's been a long time, and we still haven't made much progress. My effort has, too, in no small part because it's not worth my time to wait for managing new things while juggling around old stuff. To avoid stagnation, I propose...

We accept that:

  • Accept that so far, if you want to use batchspawner/wrapspawner, you basically have to deploy from master and be willing to do digging and debugging. Accept that this may still be the case in the future, because our (the developers) deployments are so conservative that we don't deploy that often, and everyone's systems are so different anyway.
  • So, accept that master will be more unstable than before. Anyway, now it's stable but often don't work because they are stagnant, so unstable isn't much worse.
  • Accept that much better tests won't come before more development

So I propose:

  • Discuss in the next jupyterhub team meeting see links here. At least I usually go to these and will bring this up.
  • Release 1.0 right away.
  • Add those of us who have made significant patches in the last year are given write access
  • Low threshold for accepting PRs. If it seems reasonable, doesn't introduce too much maintenance burden, and you are willing to debug and fix if stuff goes wrong, then merge it. There's no perfection, only iteration.
  • Hope that new excitement prompts new tests to come out when they become needed.
  • Quickly accept anything that takes us closer to best practices: readthedocs, github-actions to test, github-actions to automatically release, etc.
  • Think about versioning and how to do it with a fast-moving master. Do we want to release only when it's "probably perfect", or accept rapid releases that might not be perfect?

It's my summer time to focus on Jupyter stuff, if we do this then I can make some progress. But I probably won't do much if we stay at the current speed.

rkdarst avatar Jul 18 '20 20:07 rkdarst

I looked through the insights, and my conclusion is this repo is more used than the main activity shows. In particular, there are far more forks with custom commits than people submitting PRs (custom commits is sort of expected here, but it show we have a large untapped developer potential).

The people I know who have been active at our meetings and made commits either here or JupyterHub are @rkdarst @rcthomas and @cmd-ntrf (did I miss anyone?). I propose these people be given maintain-level repository access to batchspawner and wrapspawner (and why not jupyterhub-deploy-hpc). I think we need admin access to set up readthedocs integration, so I would like that as well if no one can help me with that (whatever people normally do). Does anyone know if admin permissions are needed to manage github actions?

rkdarst avatar Jul 19 '20 08:07 rkdarst

Hi @rkdarst , there's a few things to respond to here.

As we agreed at the last batchspawner call, I've just issued the 1.0.0 release.

Given that there do seem to be a number of people "silently" relying on this repo, I am reluctant to not give those people a stable target. However, I agree with the spirit of not letting that get in the way of freely accepting changes as we iterate toward the next release, including potentially breaking things as we try out new approaches. My suggestion would be to adopt the widely-used practice of having separate development and release branches. That way we can make minor and careful bugfixes to the released version, and work at a more reasonable pace in a branch that doesn't make any promises about stability. Once we're happy with the result we can converge them for a new major release.

I think the reason we didn't do that from the start was that we were worried about making extra work potentially having to fix bugs in two branches in parallel. However, I think where we're at now, that is the lesser problem than continuing to drag our feet around a code base that is effectively "in production" at all times.

If we do that, I'd be happy to see us broaden the set of people with write access, provided we continue to handle the "release" branch with a high level of discipline.

Thoughts?

mbmilligan avatar Jul 21 '20 21:07 mbmilligan

. My suggestion would be to adopt the widely-used practice of having separate development and release branches. That way we can make minor and careful bugfixes to the released version, and work at a more reasonable pace in a branch that doesn't make any promises about stability. Once we're happy with the result we can converge them for a new major release.

I think the reason we didn't do that from the start was that we were worried about making extra work potentially having to fix bugs in two branches in parallel. However, I think where we're at now, that is the lesser problem than continuing to drag our feet around a code base that is effectively "in production" at all times.

If we do that, I'd be happy to see us broaden the set of people with write access, provided we continue to handle the "release" branch with a high level of discipline.

I don't want to make master unstable - it would still try to be as tested and valid as it is now. It's more that we can never be 100% sure it's stable, but that shouldn't block us. My thinking was most bugs are submarine-bugs, they aren't detectable through our testing (because they are so particular to someone's environment or setup).

I don't mind the idea of a master and stable, but we might end up in a similar situation where stable is never updated since something might still break, and the longer it goes, the worse it gets. I would prefer default branch be development, since default branch is where people usually send PRs and patches.

We can get a similar effect by "simply" releasing more often, so that it's reasonable to use rely on released versions if you need stability. But does semver account for this case?

One idea from your above question: one of the first things I'll do is to document how to pin an installed version through github. I guess ~no one should deploy from github master or stable, they should find a recent hash that works, and then pin it in deployment. (Or pin a released version). At least, this should be the next 4 months of expectation.

Anyway, please make a decision or let's make a team to decide. If there isn't a reasonable place to send PRs I will start working in AaltoSciComp as the "development" branch and individual pull requests won't be that possible anymore. We can always change later.

rkdarst avatar Jul 21 '20 23:07 rkdarst

Point taken - we do want the "default" target when people create PRs to be the branch where we do active development. I'll give other a little more time to comment before taking any action, but I think I'm leaning towards the following changes:

  • rename master to main (the term "master" is being deprecated) but otherwise leave it as the default branch
  • in the event we need to hotfix a release, we will start a release hotfixes branch from that tag to prepare e.g. v1.0.1 etc
  • for future major releases, we will create a release branch (like release-1.1) when we are ready to stop accepting new features, so we don't have to stop development on main while we work through release candidate testing
  • we will aim for a cadence closer to 2-4 major releases per year, instead of a year+ per release
  • we will give the most active contributors the ability to accept PRs to the main branch, possibly with some requirements for multiple sign off or code review(?)

Many of these points need no immediate action, so we can discuss this at the next batchspawner call. But if there are no prompt objections, I will implement the master -> main rename and start accepting ready PRs to that branch.

mbmilligan avatar Jul 22 '20 19:07 mbmilligan

rename master to main (the term "master" is being deprecated) but otherwise leave it as the default branch

I don't object but it would seem wise to follow the lead of the jupytehub organization? https://github.com/jupyterhub/team-compass/issues/299#issuecomment-645927944 . They also considered the effect on PRs and have a script to handle other effects which I presume someone more knowledgeable has figured out.

rkdarst avatar Jul 23 '20 06:07 rkdarst

  • Discuss in the next jupyterhub team meeting see links here. At least I usually go to these and will bring this up.

Next team meeting described jupyterhub/team-compass#319, 20.august 8:00 UTC (unfortunately, not a very good time for the Americas), so I guess it may not be that productive. Anyway, I added a point about HPC docs and batchspawner questions there, in case anyone can take part.

rkdarst avatar Jul 23 '20 09:07 rkdarst

Alternatively would it help to get the Batchspawner/HPC meetings onto the team compass?

rcthomas avatar Jul 23 '20 15:07 rcthomas

  • we will give the most active contributors the ability to accept PRs to the main branch, possibly with some requirements for multiple sign off or code review(?)

Many of these points need no immediate action, so we can discuss this at the next batchspawner call. But if there are no prompt objections, I will implement the master -> main rename and start accepting ready PRs to that branch.

Just do something and I'm happy. Right now people contribute and their stuff vanishes. I want to do X, but Y and Z has to happen first. Or I have to make my own integration branch and go off on my own path and hope that someday it will be useful. By the time mid-August comes around, I will be too busy again and everything will get forgotten. The last time a lot got done, I made an integration branch and tried to handle as many PRs as I could at once. This isn't good, and I don't want to do it again, but we need rapid handling of PRs.

Do you think we can get to a more responsive development? I think we can, but we need more maintainers. @mbmilligan has done a lot for us that we should appreciate, but I think we need a better strategy in the future.

Please let me know if we'd like to get PRs flowing within a week, I should start another integrating branch (which hopefully would be used, but it's not the ideal way to develop; worst case I maintain a fork), or I should do nothing.

Alternatively would it help to get the Batchspawner/HPC meetings onto the team compass?

If you mean advertise it there, I'm sure an issue would be noticed though I don't know how many people would come. If you mean report back: I attend most of them now, but can't do much about batchspawner/HPC because not much happens and I can't do much. They alternate between a good time for asia and a good time for the Americas, so you all should be able to make half of them. I can relay info as needed.

How do you think we can make more use of the jupyterhub team?

rkdarst avatar Jul 23 '20 19:07 rkdarst

I've been sitting in on those meetings when my schedule allows as well. They're a good way to keep track of what's going on in a broader sense, who is who, and on at least 2 occasions now it's resulted into (for me) quick responses to issues I'm working on or curious about.

What I was thinking was that we surface the "BatchSpawner & Friends" monthly call we do as providers of Jupyter deployments in HPC "as a JupyterHub thing." I know that on "our" side as providers we have critical mass and more interaction with the project would provide more visibility, exposure to stakeholders, and attract developers. I don't think cramming it into the existing meeting (which is already JupyterHub+BinderHub) makes sense. The thought is to create accountability on the HPC+Jupyter/BatchSpawner side as well.

We could ask on the team compass whether they'd be open to that, we may need to suggest the idea at a future hubs team call.

rcthomas avatar Jul 23 '20 20:07 rcthomas

Not seeing any objections to the bit about moving to a rapid development mode in the main branch, I will start flowing ready PRs in today. Thanks for making #182 to help that along.

The next Batchspawner call is 8/5, so that's sooner than the next Jupyterhub call, and we can have live discussion on some of these other questions then as appropriate. I would not generally be able to make an 08:00 UTC call anyway.

mbmilligan avatar Jul 23 '20 21:07 mbmilligan

I think we should organize and make ourselves more visible in the JupyterHub team meetings. Right now it does focus on kubernetes/binder more, but my hypothesis is that there simply isn't much activity in other subprojects. We should keep our own meeting, but report more back there and discuss big ideas. (At least everyone should be able to make every other month).

Also, I've wanted to promote batchspawner/HPC as one of the main ways to run jupyter (making Big Three of TLJH and kubernetes), but without having much development speed or a team behind it, it seemed like a weak argument. I hope we can get things moving, and better publish what we do.

rkdarst avatar Jul 28 '20 21:07 rkdarst

Also, I've wanted to promote batchspawner/HPC as one of the main ways to run jupyter (making Big Three of TLJH and kubernetes), but without having much development speed or a team behind it, it seemed like a weak argument. I hope we can get things moving, and better publish what we do.

I definitely like this idea, and I tend to think that increased visibility and better development pace/resources form a bit of a virtuous circle. This is, incidentally, also an argument for us to maintain a good presence at conferences where Jupyterhub is relevant - Scipy, Jupytercon, PEARC, etc etc.

Regarding the branch renaming thing, in coincidentally good timing Github put out updated guidance a couple of days ago: https://github.com/github/renaming

It sounds like the main takeaway is that they will be rolling out built-in functionality to do renames with all the associated link repointing later this year, so I think we can hold off on that action item until then.

mbmilligan avatar Jul 29 '20 18:07 mbmilligan

I think it's fine to add more maintainers with commit rights to this repository- I think that makes more sense and is more sustainable than relying on core JupyterHub core members to merge a PR when it's approved by someone here. There are enough other repos to look after 😀.

Do you want to add a list of proposed new maintainers under the agenda section you already added for https://github.com/jupyterhub/team-compass/issues/319 ?

In terms of getting more contributors have you thought about publicising this more on the forum? It could be just asking for help, giving regular updates on what's new, or promoting the "BatchSpawner & Friends" call?

manics avatar Aug 03 '20 09:08 manics

@rcthomas I think that having HPC folks attend the JupyterHub meetings. These meetings would definitely be appropriate for raising the awareness of HPC needs/issues.

@mbmilligan Definitely support moving to main as the default branch. I would follow the approach of the group in the Team Compass repo. I believe GitHub recently announced better tooling by year's end to support a more seamless transition to main. I've moved a bunch of repos over and it's been painless.

willingc avatar Aug 03 '20 14:08 willingc