isONcorrect
isONcorrect copied to clipboard
Parallel processors mode is not working
Hi guys, I'm trying to run with parallel processors, but I realized that it is not working. It is running with only one processor, that is why it is taking forever. What should I do? Do I need to set something up during the installation?
My command line (it is a 40 processors with 500GB RAM server): (The conda environment is activated) run_isoncorrect --t 20 --fastq_folder 01-isonclustering/02-clustered-fastq/ --outfolder 03-ONT-fastq-corrected
I just saw the specifications again. I'll try to run from .sh script.
Ok, great. It should work with multiple cores using the run_isoncorrect --t 20
command. Let me know how it goes.
Hi Sahlin, I realized that the last clusters are running with fewer processors than the first ones. Now it is taking forever (~8 hours per cluster). Are these clusters the longest ones?
If you have a few very large clusters, you can(/should) use --split_wrt_batches
.
According to the documentation, this option
--split_wrt_batches Process reads per batch (of max_seqs sequences)
instead of per cluster. Significantly decrease runtime when few
very large clusters are less than the number of cores used.
Here max_seqs
is typically 1000 or 2000, this speeds it up a lot when few very large clusters are present. We used this mode for the SIRV dataset (in the paper) which had one of the clusters being half of the reads.
Hi Sahlin, I canceled my last script (without --split_wrt_branches), but now I have another issue. It seems stopped in the last cluster, and the program can not finish properly. I run the test data (100 reads) and it worked well. I don't know what is going on:
My script: #!/bin/bash
Pipeline to get high-quality full-length reads from ONT cDNA sequencing
Set path to output and number of cores
root_out="03-correction" cores=20 mkdir -p $root_out run_isoncorrect --t $cores --fastq_folder 01-isonclustering/02-clustered-fastq/ --outfolder $root_out --split_wrt_batches
The error:
Running isoncorrect batch_id:100000_0... multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/eniac/miniconda3/envs/isoncorrect/lib/python3.11/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) ^^^^^^^^^^^^^^^^^^^ File "/home/eniac/miniconda3/envs/isoncorrect/bin/run_isoncorrect", line 94, in isoncorrect subprocess.check_call([ "/usr/bin/time", isoncorrect_exec, "--fastq", read_fastq_file, "--outfolder", outfolder, File "/home/eniac/miniconda3/envs/isoncorrect/lib/python3.11/subprocess.py", line 413, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/usr/bin/time', '/home/eniac/miniconda3/envs/isoncorrect/bin/isONcorrect', '--fastq', '/tmp/tmpl_shyp5i/split_in_batches/100000_0.fastq', '--outfolder', '03-correction/100000_0', '--exact_instance_limit', '50', '--max_seqs', '2000', '--k', '9', '--w', '20', '--xmin', '18', '--xmax', '80', '--T', '0.1']' returned non-zero exit status 1. """
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/eniac/miniconda3/envs/isoncorrect/bin/run_isoncorrect", line 365, in
Hi Sahlin, I canceled my last script (without --split_wrt_branches), but now I have another issue. It seems stopped in the last cluster, and the program can not finish properly. I run the test data (100 reads) and it worked well. I don't know what is going on:
My script: #!/bin/bash
Pipeline to get high-quality full-length reads from ONT cDNA sequencing
Set path to output and number of cores
root_out="03-correction" cores=20 mkdir -p $root_out run_isoncorrect --t $cores --fastq_folder 01-isonclustering/02-clustered-fastq/ --outfolder $root_out --split_wrt_batches
The error:
Running isoncorrect batch_id:100000_0... multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/eniac/miniconda3/envs/isoncorrect/lib/python3.11/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) ^^^^^^^^^^^^^^^^^^^ File "/home/eniac/miniconda3/envs/isoncorrect/bin/run_isoncorrect", line 94, in isoncorrect subprocess.check_call([ "/usr/bin/time", isoncorrect_exec, "--fastq", read_fastq_file, "--outfolder", outfolder, File "/home/eniac/miniconda3/envs/isoncorrect/lib/python3.11/subprocess.py", line 413, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/usr/bin/time', '/home/eniac/miniconda3/envs/isoncorrect/bin/isONcorrect', '--fastq', '/tmp/tmpl_shyp5i/split_in_batches/100000_0.fastq', '--outfolder', '03-correction/100000_0', '--exact_instance_limit', '50', '--max_seqs', '2000', '--k', '9', '--w', '20', '--xmin', '18', '--xmax', '80', '--T', '0.1']' returned non-zero exit status 1. """
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/home/eniac/miniconda3/envs/isoncorrect/bin/run_isoncorrect", line 365, in main(args) File "/home/eniac/miniconda3/envs/isoncorrect/bin/run_isoncorrect", line 281, in main for x in pool.imap_unordered(isoncorrect, instances): File "/home/eniac/miniconda3/envs/isoncorrect/lib/python3.11/multiprocessing/pool.py", line 873, in next raise value subprocess.CalledProcessError: Command '['/usr/bin/time', '/home/eniac/miniconda3/envs/isoncorrect/bin/isONcorrect', '--fastq', '/tmp/tmpl_shyp5i/split_in_batches/100000_0.fastq', '--outfolder', '03-correction/100000_0', '--exact_instance_limit', '50', '--max_seqs', '2000', '--k', '9', '--w', '20', '--xmin', '18', '--xmax', '80', '--T', '0.1']' returned non-zero exit status 1.
*update, with the test data (100 reads), it did not work as well. But as it is a small dataset, the program was able to generate the final fastq for each cluster.
If the file /tmp/tmpl_shyp5i/split_in_batches/100000_0.fastq
is still there, could you try running:
/usr/bin/time /home/eniac/miniconda3/envs/isoncorrect/bin/isONcorrect --fastq \
/tmp/tmpl_shyp5i/split_in_batches/100000_0.fastq --outfolder 03-correction/100000_0 \
--exact_instance_limit 50 --max_seqs 2000 --k 9 --w 20 --xmin 18 --xmax 80 --T 0.1
This is the instance that generates an error.
Perhaps isONcorrect
also logs the error for this in a file .stderr
somewhere in the output older in 03-correction/100000_0
. I forgot if i Implemented that. In that case you could check the error in that file.
Thank you for replying. I realized that depends on python version the error change a little bit, but it is still not working. I've tried to install via github (python 3.8), via conda is 3.11 automatically. Then, I reinstalled via conda forcing python=3.10 version, it seems that run more clusters, but I still have the same issue above.
I also realized that just few clusters were done, so it is not viable to run cluster by cluster in the tmp folder (and as I had this issue, I'm not sure if all my clusters were there).
It seems that some clusters I got the final run, others stopped in the middle of the process and others just crashed. If you have some idea what is going on, I really appreciate that. Thanks :)
Plus, the stderr file in the failed clusters said:
Traceback (most recent call last):
File "/home/eniac/miniconda3/envs/isoncorrect/bin/isONcorrect", line 1551, in
The error you reported is just because the file is not there anymore (these files get flushed from the tmp folder regularly by the system). It is not the actual error you encounter when running.
Another guess: remove the output folder 03-correction
, perhaps some old files there interfere with your output from other attempts.
Otherwise, perhaps you could copy the offending file from the temp folder when it is present and run the command on that file as
/usr/bin/time /home/eniac/miniconda3/envs/isoncorrect/bin/isONcorrect --fastq \
THE_FAILING_TMP_FILE.fastq --outfolder 03-correction/100000_0 \
--exact_instance_limit 50 --max_seqs 2000 --k 9 --w 20 --xmin 18 --xmax 80 --T 0.1
isONcorrect will let you know at start of the run which tmp folder it is working in by writing Temporary workdirectory: [HERE IS THE PATH]
Hi, Sahlin. Yes, I checked the last issues, my error is similar to other, with tmp folder and etc... (I'm deleting the previous out file before running). I checked the run_isoncorrect script and found the lines "/usr/bin/time", now replaced to "time", but I see that the symbolic link can't be opened in tmp folder: (my version is 0.0.8).
As I noticed that several clusters are with the same problem, I'm not sure if is viable to copy and run all that by using isONcorrect.
When I check the fail cluster, I have this: ls -lah /tmp/tmpzgp8a2tb/split_in_batches/100000_0.fastq lrwxrwxrwx 1 eniac eniac 49 Dec 1 11:04 /tmp/tmpzgp8a2tb/split_in_batches/100000_0.fastq -> 01-isonclustering/02-clustered-fastq/100000.fastq
Nonetheless, I can't open the file. head /tmp/tmpzgp8a2tb/split_in_batches/100000_0.fastq head: cannot open '/tmp/tmpzgp8a2tb/split_in_batches/100000_0.fastq' for reading: No such file or directory
I'll keep trying to solve this. If you have some tip, please, I appreciate that. Thanks
Not sure why you have a symbolic link to the file?
You need to copy it completely cp /tmp/tmpzgp8a2tb/split_in_batches/100000_0.fastq THE_FAILING_TMP_FILE.fastq
.
At the moment you only seem to have a symbolic link, which means that if the tmp file disappear you no longer have access to it.
I think we are not discussing the same page... The program create the symlink to the files to run --split_in_batches...
Okay, how about this.
On line 218 in run_isoncorrect
here, please change this line to tmp_work_dir = "XX-correction/"
(or whatever path you want on your system). This way all the files will be present in XX-correction/
and you have them there should anything break along the way.
Then we can locate the file(s) where it is going wrong and run them individually for error message.