RNA-Bloom icon indicating copy to clipboard operation
RNA-Bloom copied to clipboard

Inputting multiple long-read files at once

Open dvirdi01 opened this issue 1 year ago • 3 comments

All the files I need to run this on are in a directory. Is there a way I can give the path to the directory in -long <path/to/directory> rather than listing out all files like -long <FILEA FILEB ....> ?

Also, is there a way to run bloom with snakemake?

dvirdi01 avatar Oct 04 '23 16:10 dvirdi01

The input argument cannot be a directory. If you have too many read files, then you can aggregate all the read file paths one on each line within a text file. You can specify the path to this text file with the @ prefix, e.g.

rnabloom -long @/path/to/list_file.txt ...

Example content of list_file.txt:

/path/to/read_file_01.fastq.gz
/path/to/read_file_02.fastq.gz
/path/to/read_file_03.fastq.gz
/path/to/read_file_04.fastq.gz
/path/to/read_file_05.fastq.gz

You can run RNA-Bloom in a single command; you don't need snakemake.

kmnip avatar Oct 04 '23 18:10 kmnip

If RNA-Bloom is a step in your Snakemake workflow, then you can run RNA-Bloom as a shell command within a rule. FYI: https://snakemake.readthedocs.io/en/v3.12.0/snakefiles/rules.html

kmnip avatar Oct 04 '23 18:10 kmnip

I ran rnabloom on each input file separately and it produced the transcripts for each of them. However, when I give it all the input files at once to make a combined transcriptome, it gives me the following error:

Exception in thread "Thread-837" java.lang.OutOfMemoryError: Java heap space Line 3 of FASTQ record is expected to start with '+' rnabloom.io.FileFormatException: Line 3 of FASTQ record is expected to start with '+'

This is the command I ran: rnabloom -long sample1.fastq sample2.fastq sample3.fastq -t 48 -outdir /.../assembly

dvirdi01 avatar Nov 06 '23 15:11 dvirdi01