alevinQC doesn't seem to work with salmon v1.9.0
I've recently mapped some reads that I'd like to create a summary report for, but alevinQC doesn't seem to recognise the format:
> alevinQCReport(baseDir = baseDir, sampleId = "testSample",
+ outputFile = "alevinReport_S2.pdf",
+ outputFormat = "pdf_document");
Error in checkAlevinInputFiles(baseDir) :
Input directory not compatible with Salmon v0.14 or newer (without external whitelist), the following required file(s) are missing or malformed:
salmon_1.9_OG_2022-Oct-13_S2/alevin/raw_cb_frequency.txt
Input directory not compatible with Salmon v0.14 or newer (with external whitelist), the following required file(s) are missing or malformed:
salmon_1.9_OG_2022-Oct-13_S2/alevin/raw_cb_frequency.txt
Input directory not compatible with Salmon v0.14 or newer (without final whitelist), the following required file(s) are missing or malformed:
salmon_1.9_OG_2022-Oct-13_S2/alevin/raw_cb_frequency.txt
Input directory not compatible with Salmon pre-v0.14, the following required file(s) are missing or malformed:
salmon_1.9_OG_2022-Oct-13_S2/alevin/raw_cb_frequency.txt
My alevin directory has the following structure:
$ tree alevin
alevin
├── alevin.log
├── featureDump.txt
├── quants_mat_cols.txt
├── quants_mat.gz
├── quants_mat_rows.txt
├── quants_tier_mat.gz
└── whitelist.txt
0 directories, 7 files
From looking at some bits of the code, the file alevinQC is looking for seems to be most similar to the featureDump.txt file, but headings are a little different. The featureDump.txt file has the following format:
CB CorrectedReads MappedReads DeduplicatedReads MappingRate DedupRate MeanByMax NumGenesExpressed NumGenesOverMean
ACTATGAAAAAAACTGCGATCTCGGTT 277519 196726 35294 0.708874 0.820593 0.00953423 6041 1082
GGATTAGGATATAGCCCTAATTCGGCG 320755 234503 42377 0.731097 0.81929 0.00562383 6375 1041
ACACATTGCATCGTTAGCATGAAGCCA 301996 243735 41372 0.80708 0.830258 0.00916166 5934 1104
GTAGCCATCGATAGTGACCACTATTAG 284894 211523 31549 0.742462 0.850848 0.0127815 5877 1116
GTAGCCATCGACGTTCGGCAACAGGCT 313862 240017 35699 0.764721 0.851265 0.0150326 6401 1242
GTAGCCATCGACCGACAACAAGCCTGG 282656 211553 31087 0.748447 0.853053 0.0105161 5785 1115
GTAGCCATCGATGACAGACATCTCACG 276135 203030 31716 0.735256 0.843787 0.0148435 5860 1112
AGACGGATTAAAACGTACAACTGAATT 350865 253253 42605 0.721796 0.831769 0.00804764 6472 1130
GTCAAAGGCGAGTGTGTCCAAGTAATT 269782 200449 30527 0.743004 0.847707 0.0131974 5754 1090
Just in case it helps, I've attached one of my featureDump.txt files to this post.
Thanks for the report @gringer - the featureDump.txt file looks fine, but alevinQC is looking for a file with the frequency of all cell barcodes (before filtering and quantification). Recent versions of the package don't require this for alevin-fry output, and it seems I need to make the same change for alevin output. Would you mind sharing the command you used to run alevin? Thanks!
@csoneson Sorry, I missed the notification on this. Here is the run command that I have in my script file:
## map using Salmon with corrected barcodes (all lanes)
salmon alevin -l ISR \
-1 $(ls demultiplexed/squished_${machineID}*_R1_001.fastq.gz | sort) \
-2 $(ls demultiplexed/${machineID}*_R2_001.fastq.gz | sort) \
--mrna ${indexDir}/mt_genes.txt --rrna ${indexDir}/rRNA_genes.txt \
-i ${indexDir}/${indexName} --expectCells ${expectCellCount} --whitelist goodSampleTagCells.txt \
-p 10 -o salmon_1.9_cbc_whitelist_${projectID}_combined --tgMap ${indexDir}/txp2gene_${targetName}.txt \
--umi-geometry '1[28-35]' --bc-geometry '1[1-27]' --read-geometry '2[1-end]'
Thanks @gringer - @k3yavi, any idea what might be going on here? I just ran Salmon/alevin 1.9.0 with an external whitelist on some test data, and I still get an alevin/raw_cb_frequency.txt file (both with and without --expectCells).
Hi,
I think alevin should write the raw_cb_frequency.txt regardless of the whitelist command which @csoneson has already verified. It's unclear what's going on @gringer , are you sure that the salmon finished successfully? Is it possible to share the salmon and alevin logs?
@gringer if everything looks correct, then can you also try adding the --dumpFeatures flag to your command line and check if alevin writes the raw_cb_frequency.txt file?
Yes, thank you, I have a raw_cb_frequency.txt file now, and AlevinQC is happy. Adding the --dumpFeatures command line argument seems to have fixed the problem I was having:
salmon_1.9_cbc_withFeatures_whitelist_OG_2022-Oct-13_combined/
├── alevin
│ ├── alevin.log
│ ├── featureDump.txt
│ ├── quants_mat_cols.txt
│ ├── quants_mat.gz
│ ├── quants_mat_rows.txt
│ ├── quants_tier_mat.gz
│ └── raw_cb_frequency.txt
├── aux_info
│ ├── alevin_meta_info.json
│ ├── ambig_info.tsv
│ ├── expected_bias.gz
│ ├── fld.gz
│ ├── meta_info.json
│ ├── observed_bias_3p.gz
│ └── observed_bias.gz
├── cmd_info.json
├── lib_format_counts.json
├── libParams
│ └── flenDist.txt
└── logs
└── salmon_quant.log
Great! I'll close this issue then as the problem seems to be solved - feel free to reopen if needed.