transabyss icon indicating copy to clipboard operation
transabyss copied to clipboard

Running Trans-ABySS in threaded mode, but ABYSS seems to be running single-threaded

Open elissasoroj opened this issue 11 months ago • 4 comments

Hello,

I am trying to process a large number of assemblies under a bit of a time crunch. I am running Trans-ABySS with the following command:

transabyss --pe krakennp_SRR20074402_out_1.fq.gz krakennp_SRR20074402_out_2.fq.gz krakennp_SRR20074403_out_1.fq.gz krakennp_SRR20074403_out_2.fq.gz krakennp_SRR20074404_out_1.fq.gz krakennp_SRR20074404_out_2.fq.gz krakennp_SRR29324688_out_1.fq.gz krakennp_SRR29324688_out_2.fq.gz krakennp_SRR29324689_out_1.fq.gz krakennp_SRR29324689_out_2.fq.gz krakennp_SRR29324700_out_1.fq.gz krakennp_SRR29324700_out_2.fq.gz krakennp_SRR29324701_out_1.fq.gz krakennp_SRR29324701_out_2.fq.gz -k 32 --name crichardii_ncbiCrHAM_transabyss_k32_out.fa --threads 18

Trans-ABySS seems to initialize fine:

Found Trans-ABySS directory at: /home/elissa/miniconda3/envs/abyss
Found Trans-ABySS `bin` directory at: /home/elissa/miniconda3/envs/abyss/bin
Found script at: /home/elissa/miniconda3/envs/abyss/bin/skip_psl_self.awk
Found script at: /home/elissa/miniconda3/envs/abyss/bin/skip_psl_self_ss.awk
Found `abyss-pe' at /home/elissa/miniconda3/envs/abyss/bin/abyss-pe
Found `MergeContigs' at /home/elissa/miniconda3/envs/abyss/bin/MergeContigs
Found `abyss-filtergraph' at /home/elissa/miniconda3/envs/abyss/bin/abyss-filtergraph
Found `abyss-junction' at /home/elissa/miniconda3/envs/abyss/bin/abyss-junction
Found `blat' at /home/elissa/miniconda3/envs/abyss/bin/blat
Found `abyss-map' at /home/elissa/miniconda3/envs/abyss/bin/abyss-map
# CPU(s) available:     80
# thread(s) requested:  18
# thread(s) to use:     18

But then it takes about 6 hours to read in one fq file at a time and discard reads (seems to be using these settings: ABYSS -k32 -q3 -e2 -E0 -c2 --coverage-hist=coverage.hist ...).

This seems like a parallelizeable step to me, or is this just standard behavior?

I am getting this error at the very beginning of the run. I thought it was not that important since it did not seem to interfere with the process for others (e.g. https://github.com/bcgsc/transabyss/issues/26). However, I see the parameter j=18 up above the error, so perhaps it is related?

CMD: bash -euo pipefail -c 'abyss-pe graph=adj --directory=/mnt/pinky/elissa/1n2n/transabyss/crichardii k=32 name=crichardii_ncbiCrHAM_transabyss_k32_out.fa E=0 e=2 c=2 j=18 crichardii_ncbiCrHAM_transabyss_k32_out.fa-1.fa crichardii_ncbiCrHAM_transabyss_k32_out.fa-1.adj q=3 se="/mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR20074402_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR20074402_out_2.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR20074403_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR20074403_out_2.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR20074404_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR20074404_out_2.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324688_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324688_out_2.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324689_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324689_out_2.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324700_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324700_out_2.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324701_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324701_out_2.fq.gz"'
make: Entering directory '/mnt/pinky/elissa/1n2n/transabyss/crichardii'
dirname: missing operand
Try 'dirname --help' for more information.
ABYSS -k32 -q3 -e2 -E0 -c2    --coverage-hist=coverage.hist -s crichardii_ncbiCrHAM_transabyss_k32_out.fa-bubbles.fa  -o crichardii_ncbiCrHAM_transabyss_k32_out.fa-1.fa  /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR20074402_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR20074402_out_2.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR20074403_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR20074403_out_2.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR20074404_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR20074404_out_2.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324688_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324688_out_2.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324689_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324689_out_2.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324700_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324700_out_2.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324701_out_1.fq.gz /mnt/pinky/elissa/1n2n/kraken/crichardii/krakennp_SRR29324701_out_2.fq.gz

Any help is greatly appreciates. Sorry if I'm missing something obvious.

~Elissa

elissasoroj avatar Jan 12 '25 14:01 elissasoroj

Hi @elissasoroj ,

If I remember correctly, ABySS (without Bloom filter deBruijn graph) can only read multiple read files at the same time if it was using MPI. Trans-ABySS doesn't run ABySS with MPI enabled.

The dirname: missing operand error is indeed the same issue as #26 . The solution for this issue is in my comment here: https://github.com/bcgsc/transabyss/issues/26#issuecomment-1179201270

j=18 tells abyss-pe to use 18 threads in its workflow. I don't think that is related to this issue.

Do you have to use Trans-ABySS in your work? If not, you can try RNA-Bloom: https://github.com/bcgsc/RNA-Bloom I developed it for reference-free transcriptome assembly. It should work well for your time crunch.

Ka Ming

kmnip avatar Jan 12 '25 20:01 kmnip

Hi Ka Ming,

Thanks for the quick reply! I am currently testing different approaches, so I will give RNA-Bloom a try!

I'd still like too try out Trans-ABySS if possible - is there a setting for it that will allow me to run ABySS in parallel - for example, is there a way to run it with the Bloom filter deBruijn graph?

Thanks again, ~Elissa

elissasoroj avatar Jan 12 '25 20:01 elissasoroj

I tried the Bloom filter DBG approach in ABySS a long time ago, but it produced a worse transcriptome assembly at the time. I decided to stick with the original DBG approach. So, I wouldn't recommend switching to the Bloom filter DBG. Sorry, I don't think there is a solution to the issue.

kmnip avatar Jan 12 '25 21:01 kmnip

Alright, thank you so much! I appreciate it!

elissasoroj avatar Jan 12 '25 22:01 elissasoroj