DNABERT_2 RuntimeError: Triton Error [CUDA]: invalid argument by run

What is the problem and the solution??

The provided data_path is /home/shiro/DNABERT_2/finetune 2023-08-31 17:57:18.856636: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib:/usr/local/cuda/lib64: 2023-08-31 17:57:18.856685: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. WARNING:root:Perform single sequence classification... WARNING:root:Perform single sequence classification... WARNING:root:Perform single sequence classification... huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Some weights of the model checkpoint at zhihan1996/DNABERT-2-117M were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.bias']

This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of BertForSequenceClassification were not initialized from the model checkpoint at zhihan1996/DNABERT-2-117M and are newly initialized: ['classifier.weight', 'bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'classifier.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Using cuda_amp half precision backend ***** Running training ***** Num examples = 36,496 Num Epochs = 5 Instantaneous batch size per device = 8 Total train batch size (w. parallel, distributed & accumulation) = 32 Gradient Accumulation steps = 4 Total optimization steps = 5,700 Number of trainable parameters = 117,070,851 0%| | 0/5700 [00:00<?, ?it/s]Traceback (most recent call last): File "", line 21, in _bwd_kernel KeyError: ('2-.-0-.-0-1e8410f206c822547fb50e2ea86e45a6-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-42648570729a4835b21c1c18cebedbfe-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float16, torch.float16, torch.float16, torch.float32, torch.float16, torch.float32, torch.float16, torch.float16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), ('matrix', False, 64, False, False, False, True, 128, 128), (True, True, True, True, True, True, True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, False), (True, False), (True, False), (True, False), (True, False), (False, False), (False, False)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "train_modified.py", line 332, in train() File "train_modified.py", line 314, in train trainer.train() File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/transformers/trainer.py", line 1664, in train return inner_training_loop( File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/transformers/trainer.py", line 2745, in training_step self.scaler.scale(loss).backward() File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward torch.autograd.backward( File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/torch/autograd/init.py", line 197, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/torch/autograd/function.py", line 267, in apply return user_fn(self, *args) File "/home/shiro/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/flash_attn_triton.py", line 1041, in backward _flash_attn_backward(do, File "/home/shiro/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/81ac6a98387cf94bc283553260f3fa6b88cef2fa/flash_attn_triton.py", line 949, in _flash_attn_backward _bwd_kernel[grid]( # type: ignore File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/jit.py", line 106, in launcher return self.run(*args, grid=grid, **kwargs) File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 73, in run timings = {config: self._bench(*args, config=config, **kwargs) File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 73, in timings = {config: self._bench(*args, config=config, **kwargs) File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 63, in _bench return do_bench(kernel_call) File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/testing.py", line 140, in do_bench fn() File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 62, in kernel_call self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **current) File "/home/shiro/miniconda3/envs/dnabert2/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 200, in run return self.fn.run(*args, **kwargs) File "", line 43, in _bwd_kernel RuntimeError: Triton Error [CUDA]: invalid argument 0%| | 0/5700 [00:00<?, ?it/s

Aug 31 '23 09:08 shiro-kur

Here are my installed packages.

(dnabert2) shiro@GTUNE:~/DNABERT_2/finetune$ pip list Package Version

absl-py 1.0.0 accelerate 0.22.0 anndata 0.7.6 antlr4-python3-runtime 4.9.3 appdirs 1.4.4 astor 0.8.1 astunparse 1.6.3 autograd 1.4 autograd-gamma 0.5.0 biopython 1.79 biothings-client 0.2.6 bleach 5.0.1 Brotli 1.0.9 cachetools 5.0.0 certifi 2023.7.22 charset-normalizer 2.0.12 click 8.1.2 cmake 3.27.2 coloredlogs 15.0.1 cycler 0.11.0 dash 2.0.0 dash-core-components 2.0.0 dash-dangerously-set-inner-html 0.0.2 dash-html-components 2.0.0 dash-table 5.0.0 docutils 0.19 einops 0.6.1 filelock 3.12.3 Flask 2.1.1 Flask-Compress 1.11 fonttools 4.32.0 formulaic 0.2.4 fsspec 2023.6.0 future 0.18.2 gast 0.3.3 google-auth 2.6.5 google-auth-oauthlib 0.4.6 google-pasta 0.2.0 grpcio 1.44.0 h5py 2.10.0 huggingface-hub 0.16.4 humanfriendly 10.0 idna 3.4 importlib-metadata 4.11.3 interface-meta 1.3.0 itsdangerous 2.1.2 Jinja2 3.1.1 joblib 1.1.0 Keras-Preprocessing 1.1.2 kiwisolver 1.4.2 lifelines 0.26.4 lit 17.0.0rc3 llvmlite 0.36.0 Markdown 3.3.6 markdown-it-py 2.1.0 MarkupSafe 2.1.1 matplotlib 3.5.1 mdurl 0.1.2 mhcflurry 2.0.5 mhcgnomes 1.7.0 mygene 3.2.2 natsort 8.1.0 np-utils 0.6.0 numba 0.53.0 numpy 1.18.5 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 omegaconf 2.3.0 opt-einsum 3.3.0 packaging 21.3 pandas 1.3.4 patsy 0.5.2 peft 0.3.0 Pillow 9.1.0 pip 23.2.1 pkginfo 1.9.6 plotly 5.4.0 protobuf 3.20.0 psutil 5.9.5 Pygments 2.14.0 pynndescent 0.5.6 pyparsing 3.0.8 python-dateutil 2.8.2 pytz 2022.1 PyYAML 6.0.1 readme-renderer 37.3 regex 2023.8.8 requests 2.26.0 requests-oauthlib 1.3.1 requests-toolbelt 0.10.1 rfc3986 2.0.0 rich 13.2.0 rsa 4.8 safetensors 0.3.3 scikit-learn 1.0.2 scipy 1.4.1 seaborn 0.11.2 serializable 0.2.1 setuptools 68.0.0 six 1.16.0 SNAF 0.5.2 statsmodels 0.13.1 tenacity 8.0.1 tensorboard 2.8.0 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 tensorboardX 2.6.2.2 tensorflow 2.3.0 tensorflow-estimator 2.3.0 termcolor 1.1.0 threadpoolctl 3.1.0 tokenizers 0.13.3 torch 1.13.0 torchaudio 0.13.0 torchvision 0.14.0 tqdm 4.62.3 transformers 4.29.2 triton 2.0.0.dev20221202 twine 4.0.2 typechecks 0.1.0 typing_extensions 4.7.1 umap-learn 0.5.2 urllib3 1.26.14 webencodings 0.5.1 Werkzeug 2.0.2 wheel 0.38.4 wrapt 1.14.0 xlrd 1.2.0 xmltodict 0.12.0 xmltramp2 3.1.1

Aug 31 '23 09:08 shiro-kur

Here is my GPU implementation.

(dnabert2) shiro@GTUNE:~$ nvidia-smi -L GPU 0: NVIDIA GeForce RTX 3070 Laptop GPU (UUID: GPU-776edd0d-aef5-ab3a-3750-32bfa854fecf)

(dnabert2) shiro@GTUNE:~$ /usr/local/cuda/bin/nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_Mar__8_18:18:20_PST_2022 Cuda compilation tools, release 11.6, V11.6.124 Build cuda_11.6.r11.6/compiler.31057947_0

(dnabert2) shiro@GTUNE:~$ dpkg -l | grep cudnn ii cudnn-local-repo-ubuntu2004-8.9.4.25 1.0-1 amd64 cudnn-local repository configuration files ii libcudnn8 8.9.4.25-1+cuda11.8 amd64 cuDNN runtime libraries ii libcudnn8-dev 8.9.4.25-1+cuda11.8 amd64 cuDNN development libraries and headers

Aug 31 '23 10:08 shiro-kur

I have the same error KeyError: ('2-.-0-.-0-1e8410f206c822547fb50e2ea86e45a6-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-42648570729a4835b21c1c18cebedbfe-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float16, torch.float16, torch.float16, torch.float32, torch.float16, torch.float32, torch.float16, torch.float16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), ('matrix', False, 64, False, False, False, True, 128, 128), (True, True, True, True, True, True, True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, False), (True, False), (True, False), (True, False), (True, False), (False, False), (False, False)))

Do you solve it?Thanks!

Feb 01 '24 03:02 jiaojiaoguan

I just gave up..... Sorry.

Feb 01 '24 08:02 shiro-kur

DNABERT_2
DNABERT_2 copied to clipboard

RuntimeError: Triton Error [CUDA]: invalid argument by run_dnabert2.sh

DNABERT_2 DNABERT_2 copied to clipboard

RuntimeError: Triton Error [CUDA]: invalid argument by run_dnabert2.sh

DNABERT_2
DNABERT_2 copied to clipboard