RuntimeError: handle_0 INTERNAL ASSERT FAILED at "../c10/cuda/driver_api.cpp":15, please report a bug to PyTorch.
Been trying for some time now and always run into this error. Everything prior worked. What am I doing wrong? RTX3090 - 24go Windows 10 but on Ubuntu using wsl, maybe that's the problem but don't want to install Ubuntu on a new partition.
python3 finetune/adapter_v2.py --data_dir data/alpaca --checkpoint_dir checkpoints/tiiuae/falcon-7b --out_dir out/adapter/alpaca
/usr/lib/python3/dist-packages/pkg_resources/init.py:116: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/usr/lib/python3/dist-packages/pkg_resources/init.py:116: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
Global seed set to 1337
Loading model 'checkpoints/tiiuae/falcon-7b/lit_model.pth' with {'block_size': 2048, 'vocab_size': 50254, 'padding_multiple': 512, 'padded_vocab_size': 65024, 'n_layer': 32, 'n_head': 71, 'n_embd': 4544, 'rotary_percentage': 1.0, 'parallel_residual': True, 'bias': False, 'n_query_groups': 1, 'shared_attention_norm': True, 'adapter_prompt_length': 10, 'adapter_start_layer': 2}
Number of trainable parameters: 3839186
/usr/local/lib/python3.10/dist-packages/lightning/fabric/fabric.py:828: PossibleUserWarning: The model passed to Fabric.setup() has parameters on different devices. Since move_to_device=True, all parameters will be moved to the new device. If this is not desired, set Fabric.setup(..., move_to_device=False).
rank_zero_warn(
iter 0: loss 2.7154, time: 2929.28ms
Traceback (most recent call last):
File "/root/lit-parrot/finetune/adapter_v2.py", line 254, in
Do you get errors with other checkpoints?
If you have enough system RAM, you could try running one step on CPU. Even though it's very slow, it usually gives a better error message than when run on CUDA.
That is if there's an actual bug or issue. It might just be an issue with your driver installation. Have you properly followed the installation steps described in the README?
Hi, I met the same problem here, I'm using a 3070 with a Ubuntu 22.04 wsl2. I think it might be a wsl2 bug. Because I'm running a different model here(yolov7).
Same error here with 3070 Ubuntu 22.04 wsl2