gpt-neox Simplify and relax dependencies

The requirements files specified in ./requirements have historically been strict as to prevent CI Docker images changing without our prior knowledge. However, this places a burden on users who would like to run GPT-NeoX on a host without containerization, and many non-critical packages are unnecessarily strictly specified as well. This motivates the following changes in this PR:

Loosen the dependency version requirements for non-critical packages
Remove the unnecessary einops and mpi4py packages
Make the installation of wandb optional, making installation accessible through a new requirements file ./requirements/requirements-wandb.txt
Fix an oversight that made the Megatron GPT-2 tokenizer inaccessible
Clean up the deepy.py DeepSpeed launcher script

Nov 06 '21 02:11 EricHallahan

This looks good to me. @EricHallahan’s explanation of how he decided which imports go inside main() seem like a reasonable rule of thumb, and I like the changes to the requirements system in general.

I haven’t run this myself, as I’m under the impression that you’ve been testing it extensively.

Nov 07 '21 15:11 StellaAthena

Aside from the above ^ lgtm :rocket:

Nov 07 '21 15:11 sdtblck

@EricHallahan is there something you’re waiting on to merge this?

Nov 08 '21 03:11 StellaAthena

I am working on verifying that the behavior with regards to missing Weights & Biases dependencies is what I intended/makes sense, and I also need to add corresponding documentation. I expect to have this ready to merge sometime early tomorrow.

Nov 08 '21 03:11 EricHallahan

I'm not sure of the motivation for the changed to fused_kernels.py Also, I really don't like the requirements being so granular like this. I don't see the need for separate requirements files for wandb / tensorboard

If someone doesn't have WandB available and doesn't wish to use it, how would you prefer they proceed?

Nov 09 '21 22:11 StellaAthena

I'm not sure of the motivation for the changed to fused_kernels.py Also, I really don't like the requirements being so granular like this. I don't see the need for separate requirements files for wandb / tensorboard

If someone doesn't have WandB available and doesn't wish to use it, how would you prefer they proceed?

The reason sparse attention and onebitadam were separated out is because they're optional dependencies which are also a bit of a pain to install (cupy-cuda requires you specify the cuda version, and triton used to break often), so removing them from requirements.txt reduced complexity for most users.

As far as I'm aware - there are no such problems with the installation of wandb. You can just pip install it. Including it in requirements.txt does nothing more than take up a few kb more space on the user's device.

Including it in a separate file may mean someone has to take a few minutes to figure out why their logging isn't working, and go back and realise that it's actually not in requirements.txt and you need to install it separately. I can't see a counter scenario where it would actually save time / decrease complexity.

I don't know why / how requirements/requirements-tensorboard.txt ever became a separete file. @sweinbach any ideas?

Nov 09 '21 22:11 sdtblck

The only reason why I separated it out is because it mirrored how TensorBoard was handled. If we don't think that makes sense I'm happy to change it.

A counterargument to instructing users to "just install wandb/tensorboard" is versioning, which is the problem that requirements files are designed to solve. If you tell the user to install the requirements file, at least you can narrow down the environment to what is in the file rather than having to take a guess at what version that the package manager chose.

Nov 09 '21 22:11 EricHallahan

Maybe too much for this PR but related. Any reason not to use pip-compile to create the requirements file and actually fix dependencies?

Nov 10 '21 06:11 sweinbach

Maybe too much for this PR but related. Any reason not to use pip-compile to create the requirements file and actually fix dependencies?

I had originally planned this PR with a larger scope which included more granular dependency management (such as only installing the dependencies for evaluation if the user specifies they are interested in evaluation) and a setuptools script to manage that system (which would have also registered deepy.py as a console script for convenience). It however became a point of contention whether this was worth the potential confusion such a system could create, not to mention the increased complexity. I ultimately ended up stashing that line of work so that we could integrate the important changes, but if a more advanced dependency management workflow is desired it would not be hard to continue that line of work.

Nov 10 '21 18:11 EricHallahan

Closing as it’s unfixably far behind and better done from scratch

Oct 07 '22 03:10 StellaAthena