Charles Srisuwananukorn
Charles Srisuwananukorn
@DanFu09 can you take a look?
Actually, @mauriceweber, can you take a look?
We train this model on 8x A100 80GB GPUs. I'll update the README. > I... submit a request for a mini model to do sanity checks on local systems and...
About an hour per 100 steps. Usually, we fine-tune for a couple days.
Thank you for the PR, @shirayu! This looks great. I'd like to review a couple things tomorrow before merging. Please stay tuned.
After some research, many projects seem to be recommending `mamba` for faster installation (see [this article](https://pythonspeed.com/articles/faster-conda-install/) for more details). I just tested it, and it does seem much faster. Installing...
Training ran fine. I'll update the README to suggest using mamba.
I believe it also does not work on macOS. These packages require NVIDIA GPUs, which most Macs do not have.
@LorrinWWW, this is a version of your script for sharding the base model. Could you please take a look?
I've seen this issue when running out of GPU RAM. Unfortunately, the model requires an A100 80GB right now. Are you using an A100 40GB?