scrnaseq
scrnaseq copied to clipboard
Trimming R1 fast files in all aligners
Description of feature
I'm following up on a slack post that I put out 2 months ago at https://nfcore.slack.com/archives/CHN5BV5DW/p1712178056321859
I had a question about the fastq format needed for the different aligners. For everything I’m using nextflow v23.10.1
and scrnaseq v2.5.1
. From the core at my work place both the R1
and R2
fastq files each have a length of 151 for the reads, instead of R1
being “trimmed” to just be only the barcode and umi (so like 28-ish bps depending on the protocol). When using --aligner cellranger
this seems to be handled fine. However, when only switching --aligne
r to either alevin
or star
it doesn’t seem to handle that R1
read format well. For alevin
the pipeline completes but the number of barcodes in barcodes.tsv
is ~200k, which is roughly the number of reads, whereas the expected number of cells is ~5k. For star
the pipeline fails at NFCORE_SCRNASEQ:SCRNASEQ:STARSOLO:STAR_ALIGN
with the error EXITING because of FATAL ERROR in input read file: the total length of barcode sequence is 151 not equal to expected 28
. My questions are, is this known behavior of the pipeline? I would like to use alevin
or star
in the future, do I need preprocess R1
and if so, any help in doing that? Thanks!
the run command I use looks like this, just only changing the aligner argument
nextflow run nf-core/scrnaseq --input samplesheet.csv --fasta /biostats_share/mancchri/genomes/Homo_sapiens.GRCh38.dna.primary_assembly.fa --gtf /biostats_share/mancchri/genomes/Homo_sapiens.GRCh38.111.gtf --aligner cellranger --protocol 10XV3 --outdir . -work-dir ./work -c config.conf -profile singularity -r 2.5.1