RNA-Bloom
RNA-Bloom copied to clipboard
Inputting multiple long-read files at once
All the files I need to run this on are in a directory. Is there a way I can give the path to the directory in -long <path/to/directory> rather than listing out all files like -long <FILEA FILEB ....> ?
Also, is there a way to run bloom with snakemake?
The input argument cannot be a directory.
If you have too many read files, then you can aggregate all the read file paths one on each line within a text file. You can specify the path to this text file with the @
prefix, e.g.
rnabloom -long @/path/to/list_file.txt ...
Example content of list_file.txt
:
/path/to/read_file_01.fastq.gz
/path/to/read_file_02.fastq.gz
/path/to/read_file_03.fastq.gz
/path/to/read_file_04.fastq.gz
/path/to/read_file_05.fastq.gz
You can run RNA-Bloom in a single command; you don't need snakemake.
If RNA-Bloom is a step in your Snakemake workflow, then you can run RNA-Bloom as a shell
command within a rule. FYI: https://snakemake.readthedocs.io/en/v3.12.0/snakefiles/rules.html
I ran rnabloom on each input file separately and it produced the transcripts for each of them. However, when I give it all the input files at once to make a combined transcriptome, it gives me the following error:
Exception in thread "Thread-837" java.lang.OutOfMemoryError: Java heap space Line 3 of FASTQ record is expected to start with '+' rnabloom.io.FileFormatException: Line 3 of FASTQ record is expected to start with '+'
This is the command I ran: rnabloom -long sample1.fastq sample2.fastq sample3.fastq -t 48 -outdir /.../assembly