SegAlign
SegAlign copied to clipboard
thrust::system::system_error | CUDA free failed: cudaErrorCudartUnloading
[2023-09-27T10:02:11-0700] [MainThread] [W] [toil.job] Due to failure we are reducing the remaining try count of job 'LastzRepeatMaskJob' kind-LastzRepeatMaskJob/instance-r63tene1 v11 with ID kind-LastzRepeatMaskJob/instance-r63tene1 to 0
...
Log from job "'LastzRepeatMaskJob' kind-LastzRepeatMaskJob/instance-r63tene1 v12" follows:
=========>
...
File "/home/cactus/cactus_env/lib/python3.8/site-packages/cactus/preprocessor/lastzRepeatMasking/cactus_lastzRepeatMask.py", line 130, in gpuRepeatMask
segalign_messages = cactus_call(parameters=cmd, work_dir=self.work_dir, returnStdErr=True, gpus=self.repeatMaskOptions.gpu,
File "/home/cactus/cactus_env/lib/python3.8/site-packages/cactus/shared/common.py", line 889, in cactus_call
raise RuntimeError("{}Command {} exited {}: {}".format(sigill_msg, call, process.returncode, out))
RuntimeError: Command /usr/bin/time -f "CACTUS-LOGGED-MEMORY-IN-KB: %M" segalign_repeat_masker /tmp/58f5d3ffa02e55c3b06625f0f8626408/0d5a/937a/tmpfg2qo5qy/gSojMU042_0_0.tgt --lastz_interval=10000000 --markend --neighbor_proportion 0.2 --M 10 --step=3 --ambiguous=iupac,100,100 --num_gpu 1 exited 134: stderr=Using 64 threads
...
Error: cudaMemcpy of 4 bytes for num_anchors failed with error " invalid argument "
terminate called after throwing an instance of 'thrust::system::system_error'
what(): CUDA free failed: cudaErrorCudartUnloading: driver shutting down
Command terminated by signal 6
CACTUS-LOGGED-MEMORY-IN-KB: 69902308
My OS is Ubuntu 20.04.6 LTS (GNU/Linux 5.4.0-153-generic x86_64).
Some specs for the GPU I'm using:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06 Driver Version: 525.125.06 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100 80G... On | 00000000:CA:00.0 Off | 0 |
| N/A 36C P0 48W / 300W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+