alevin-fry
alevin-fry copied to clipboard
Metrics for reads on a high multimapping genome
Thanks for writing alevin-fry, it has been quite useful for me lately since I am working on a genome that have a lot of sequence homology so I am hoping it helps salvage a lot of multimappers that cellranger/STAR is unable to handle at the moment. I am trying to get some metrics on how many reads were unique and multimappers and also, how many of the multi-mapped were salvageable and used in the counts matrix, however, I am struggling to find that information. Particularly I am interested to know what percentage of reads were used in the counts matrix after the parsimony EM algorithm was applied so I know how well the EM performed on salvaging the data. Most of my data is not uniquely mapped, which is why I am interested in that metric.
The output folder after quantification looks like this, and none of the files seems to contain that information:
Here are the commands I ran to get there:
STEP0: Build index
${salmon} index -t ${txnFasta} -p ${threads} --kmerLen 31 --index C_albicans_SC5314_A22_current_orf_coding_added_CaNEON_and_iRFP
STEP1: map using alevin within salmon
${salmon} alevin -lISR --chromiumV3 -1 ${r1_files} -2 ${r2_files} -o ${outdir} -i ${salmon_index} -p ${threads} --sketch --dumpFeatures
**STEP2: generate permit list **
${alevin_fry} generate-permit-list --input ${alevin_input_dir} --expected-ori fw --output-dir ${outdir} --unfiltered-pl ${barcode_whitelist} --min-reads 10
STEP3: collate data
${alevin_fry} collate -i ${fry_outdir} -r ${alevin_outdir} -t ${threads}
STEP4: quantification
${alevin_fry} quant -i ${fry_outdir} -o ${outdir} -t ${threads} -r parsimony-em -m ${reference_dir}"C_albicans_SC5314_A22_current_orf_coding_added_CaNEON_and_iRFP_txn_to_gene.txt"
Any advice would be great! Thanks!