simple-llm-finetuner
simple-llm-finetuner copied to clipboard
Issue in train in colab
While I run the train in colab, this error is shown -
Something went wrong
Connection errored out.
How can I solve this?
Getting this error too..
I'm guessing it's running out of RAM? Are you using high ram env?
No, I did not. I just tried using colab pro. I used the base model cerebras/Cerebras-GPT-2.7B. When I press train, the following error shows in colab -
No, it's broken. It works on hugging face now but can't download loras xD.
I have the same issue, I even tried running it without Gradio's tunnel but rather with another 3rd party but I get the same error.
Should note that for me colab does in fact work, but only in an A100 colab instance with more than 64 GB of RAM. It seemed to spike to ~36+ GB, which is more than the maximum for the free tier/standard RAM profile. This leads me to think it's just due to the RAM limitation of lower colab tiers.
Trying it on the generic RAM profile with a V100 (provides me with ~20-24 GB of RAM), and I had the issue listed in the original post. Trying it locally on a machine with 32 GB of RAM and a P100, I have the same problem where the RAM spikes, which leads to the machine starting the OOM killer and ending the process.
Should note that for me colab does in fact work, but only in an A100 colab instance with more than 64 GB of RAM. It seemed to spike to ~36+ GB, which is more than the maximum for the free tier/standard RAM profile. This leads me to think it's just due to the RAM limitation of lower colab tiers.
Trying it on the generic RAM profile with a V100 (provides me with ~20-24 GB of RAM), and I had the issue listed in the original post. Trying it locally on a machine with 32 GB of RAM and a P100, I have the same problem where the RAM spikes, which leads to the machine starting the OOM killer and ending the process.
What model and dataset are you using to generate and train? Because this is happening even with a half-precision 7b LLaMa model with default "unhelpful" example in my case, I can even generate with it on my PC which has only 8GB of VRAM, I can't train however, but I don't believe that fine tunning half-precision 7b LLaMa should be more demanding that 15GB of VRAM that Colab provides for free? As you can see the crash/"Connection errored out" error occurs way before RAM and/or VRAM is saturated.