scrnaseq
scrnaseq copied to clipboard
Alevin workflow cannot handle gzipped genomes or gtf files
Tested on DSL2 version in dev.
Either update modules to handle gzip files, or include an optional gunzip process.
Superseded by #78
@grst : I think this issue might still persist? I just tried to run version 2.1.0 of the pipeline with gzip compressed genome FASTA & GTF files:
[Truncated nextflow console output]
Command executed:
filter_gtf_for_genes_in_genome.py \
--gtf gencode.v42.primary_assembly.basic.annotation.gtf.gz \
--fasta GRCh38.primary_assembly.genome.fa.gz \
-o GRCh38.primary_assembly.genome.fa_genes.gtf
cat <<-END_VERSIONS > versions.yml
"NFCORE_SCRNASEQ:SCRNASEQ:GTF_GENE_FILTER":
python: $(python --version | sed 's/Python //g')
END_VERSIONS
Command exit status:
1
Command output:
(empty)
Command error:
Traceback (most recent call last):
File "/root/nextflow-bin/filter_gtf_for_genes_in_genome.py", line 82, in
extract_genes_in_genome(args.fasta, args.gtf, args.output)
File "/root/nextflow-bin/filter_gtf_for_genes_in_genome.py", line 43, in extract_genes_in_genome
seq_names_in_genome = set(extract_fasta_seq_names(fasta))
File "/root/nextflow-bin/filter_gtf_for_genes_in_genome.py", line 34, in extract_fasta_seq_names
for i, header in enumerate(faiter):
File "/root/nextflow-bin/filter_gtf_for_genes_in_genome.py", line 32, in
faiter = (x[1] for x in groupby(fh, is_header))
File "/usr/local/lib/python3.9/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte