gpt-neox icon indicating copy to clipboard operation
gpt-neox copied to clipboard

Hosted Github Runners for CI

Open Mistobaan opened this issue 3 years ago • 4 comments

Overview

In order to test effectively any changes to the codebase using the full cuda / mpi / apex stack of the repository, it would be nice to dedicate some resources of the cluster to hosted runners similar in how deepspeed tests its own code base.

  • [ ] check the feasibility in terms of hardware resources. Even a spot instance should be enough.
  • [ ] create the github workflow

Mistobaan avatar Feb 09 '22 22:02 Mistobaan

This is something we should be able to set up in the next couple weeks. Are you familiar with setting up such a hosted runner?

StellaAthena avatar Feb 10 '22 14:02 StellaAthena

I can figure out the details, it really depends on what hardware we have available, if cloud / bare metal or k8s.

Mistobaan avatar Feb 10 '22 20:02 Mistobaan

I can figure out the details, it really depends on what hardware we have available, if cloud / bare metal or k8s.

k8s, building from a Docker file. There’s info on our Docker file here

StellaAthena avatar Feb 11 '22 11:02 StellaAthena

@Mistobaan Based on our recent conversations, I'm currently under the impression that the code works and now we just need to allocate a dedicated GPU cluster and set up the CI. Is that correct? If so, I can set up a dedicated GPU cluster and we can start testing the CI.

StellaAthena avatar Feb 19 '22 22:02 StellaAthena