scrnaseq icon indicating copy to clipboard operation
scrnaseq copied to clipboard

Trimming R1 fast files in all aligners

Open ChristopherMancuso opened this issue 9 months ago • 4 comments

Description of feature

I'm following up on a slack post that I put out 2 months ago at https://nfcore.slack.com/archives/CHN5BV5DW/p1712178056321859

I had a question about the fastq format needed for the different aligners. For everything I’m using nextflow v23.10.1 and scrnaseq v2.5.1. From the core at my work place both the R1 and R2 fastq files each have a length of 151 for the reads, instead of R1 being “trimmed” to just be only the barcode and umi (so like 28-ish bps depending on the protocol). When using --aligner cellranger this seems to be handled fine. However, when only switching --aligner to either alevin or star it doesn’t seem to handle that R1 read format well. For alevin the pipeline completes but the number of barcodes in barcodes.tsv is ~200k, which is roughly the number of reads, whereas the expected number of cells is ~5k. For star the pipeline fails at NFCORE_SCRNASEQ:SCRNASEQ:STARSOLO:STAR_ALIGN with the error EXITING because of FATAL ERROR in input read file: the total length of barcode sequence is 151 not equal to expected 28. My questions are, is this known behavior of the pipeline? I would like to use alevin or star in the future, do I need preprocess R1 and if so, any help in doing that? Thanks!

the run command I use looks like this, just only changing the aligner argument

nextflow run nf-core/scrnaseq --input samplesheet.csv --fasta /biostats_share/mancchri/genomes/Homo_sapiens.GRCh38.dna.primary_assembly.fa --gtf /biostats_share/mancchri/genomes/Homo_sapiens.GRCh38.111.gtf --aligner cellranger --protocol 10XV3 --outdir . -work-dir ./work -c config.conf -profile singularity -r 2.5.1

ChristopherMancuso avatar May 24 '24 14:05 ChristopherMancuso