Sean Owen
Sean Owen
Matt notes that this was probably set to false because gradient checkpointing requires it to be off during training. But we can just edit the resulting model config for now...
Do you have cublas installed, and at a matching version for your CUDA drivers?
Not sure, it's running for me on CUDA 11.3 and 11.7, according to the code in the repo. I suspect it's something in the environment, but not sure what it...
What site are you referring to here? You should use the 'v2' models. With no particular tuning, on an A10 for example, you might expect 3-5 secs for the 3B...
How many GPUs for what?
1 GPU like an A10 on the 3B model, maybe; I haven't measured it closely. A more optimized deployment of the 12B model can hit more like 10ms / token...
@matthayes I think y'all observed that during training too, and it was mysterious. I think the answer was to turn down learning rate a bit? but the final value in...
You haven't said anything about the problem. You dont' need Databricks, but you would have to change the references to dbutils and the %pip install command.
Are you OOM? maybe restarting the kernel wasn't enough somehow or something else is still attached. From this not sure what else could be the issue.
I was thinking swapping or something. Check all your VM stats to see what might be going on, like is it even busy on the CPU