GASAL2
GASAL2 copied to clipboard
CUDA memory error on some datasets when using traceback
Hello,
I have been running into an issue where GASAL2 fails with a CUDA memory error on some datasets (but not all) when I use traceback. In my analysis, it mainly happens more often when the reference sequences are longer than 580 nt. I ran a test where I ran GASAL2 on batches of the dataset that were 10,000 sequences in size, and some fail and some succeed. If I concatenate the batches that succeed, then the larger file succeeds, which leads me to believe that there are specific sequence pairings that result in the segmentation fault. I filtered the dataset by length of reference, so that no references were longer than 580, but that did not eliminate the CUDA memory errors.
This is the command I am using to run: ./test_prog.out -a 3 -b 3 -q 6 -r 1 -s -t -y "local" ../test-data/read_file_15.fasta ../test-data/ref_file_15.fasta
This is the error that I see without running cuda-memcheck
[GASAL WARNING:] Trying to write 280 bytes while only 160 remain (query) (block size 1320000, filled 1319840 bytes). Allocating a new block of size 2640000, total size available reaches 3960000. Doing this repeadtedly slows down the execution. [GASAL WARNING:] actual_query_batch_bytes(1362176) > Allocated GPU memory (gpu_max_query_batch_bytes=1320000). Therefore, allocating 2640000 bytes on GPU (gpu_max_query_batch_bytes=2640000). Performance may be lost if this is repeated many times. [GASAL WARNING:] actual_query_batch_bytes(1362176) > Allocated HOST memory for CIGAR (gpu_max_query_batch_bytes=2640000). Therefore, allocating 5280000 bytes on the host (gpu_max_query_batch_bytes=5280000). Performance may be lost if this is repeated many times. [GASAL WARNING:] Trying to write 272 bytes while only 144 remain (query) (block size 1320000, filled 1319856 bytes). Allocating a new block of size 2640000, total size available reaches 3960000. Doing this repeadtedly slows down the execution. [GASAL WARNING:] actual_query_batch_bytes(1363936) > Allocated GPU memory (gpu_max_query_batch_bytes=1320000). Therefore, allocating 2640000 bytes on GPU (gpu_max_query_batch_bytes=2640000). Performance may be lost if this is repeated many times. [GASAL CUDA ERROR:] an illegal memory access was encountered(CUDA error no.=700). Line no. 79 in file src/gasal_align.cu srun: error: cgpu05: task 0: Exited with exit code 1 srun: launch/slurm: _step_signal: Terminating StepId=3134271.206
I am attaching a sample dataset that fails. test-data.zip