alevin-fry icon indicating copy to clipboard operation
alevin-fry copied to clipboard

Metrics for reads on a high multimapping genome

Open tbrunetti opened this issue 7 months ago • 0 comments

Thanks for writing alevin-fry, it has been quite useful for me lately since I am working on a genome that have a lot of sequence homology so I am hoping it helps salvage a lot of multimappers that cellranger/STAR is unable to handle at the moment. I am trying to get some metrics on how many reads were unique and multimappers and also, how many of the multi-mapped were salvageable and used in the counts matrix, however, I am struggling to find that information. Particularly I am interested to know what percentage of reads were used in the counts matrix after the parsimony EM algorithm was applied so I know how well the EM performed on salvaging the data. Most of my data is not uniquely mapped, which is why I am interested in that metric.

The output folder after quantification looks like this, and none of the files seems to contain that information: image

Here are the commands I ran to get there:
STEP0: Build index

${salmon} index -t ${txnFasta} -p ${threads} --kmerLen 31 --index C_albicans_SC5314_A22_current_orf_coding_added_CaNEON_and_iRFP

STEP1: map using alevin within salmon

${salmon} alevin -lISR --chromiumV3 -1 ${r1_files} -2 ${r2_files} -o ${outdir} -i ${salmon_index} -p ${threads} --sketch  --dumpFeatures

**STEP2: generate permit list **

${alevin_fry} generate-permit-list --input ${alevin_input_dir} --expected-ori fw --output-dir ${outdir} --unfiltered-pl ${barcode_whitelist} --min-reads 10

STEP3: collate data

${alevin_fry} collate -i ${fry_outdir} -r ${alevin_outdir} -t ${threads}

STEP4: quantification

${alevin_fry} quant -i ${fry_outdir} -o ${outdir} -t ${threads} -r parsimony-em -m ${reference_dir}"C_albicans_SC5314_A22_current_orf_coding_added_CaNEON_and_iRFP_txn_to_gene.txt"

Any advice would be great! Thanks!

tbrunetti avatar Jul 08 '24 17:07 tbrunetti