alevinQC icon indicating copy to clipboard operation
alevinQC copied to clipboard

alevinQC doesn't seem to work with salmon v1.9.0

Open gringer opened this issue 3 years ago • 1 comments

I've recently mapped some reads that I'd like to create a summary report for, but alevinQC doesn't seem to recognise the format:

> alevinQCReport(baseDir = baseDir, sampleId = "testSample",                                                                                                       
 +                outputFile = "alevinReport_S2.pdf",                                                                                                               
 +                outputFormat = "pdf_document");                                                                                                                   
 Error in checkAlevinInputFiles(baseDir) :
   Input directory not compatible with Salmon v0.14 or newer (without external whitelist), the following required file(s) are missing or malformed:
 salmon_1.9_OG_2022-Oct-13_S2/alevin/raw_cb_frequency.txt

 Input directory not compatible with Salmon v0.14 or newer (with external whitelist), the following required file(s) are missing or malformed:
 salmon_1.9_OG_2022-Oct-13_S2/alevin/raw_cb_frequency.txt

 Input directory not compatible with Salmon v0.14 or newer (without final whitelist), the following required file(s) are missing or malformed:
 salmon_1.9_OG_2022-Oct-13_S2/alevin/raw_cb_frequency.txt

 Input directory not compatible with Salmon pre-v0.14, the following required file(s) are missing or malformed:
 salmon_1.9_OG_2022-Oct-13_S2/alevin/raw_cb_frequency.txt

My alevin directory has the following structure:

$ tree alevin
alevin
├── alevin.log
├── featureDump.txt
├── quants_mat_cols.txt
├── quants_mat.gz
├── quants_mat_rows.txt
├── quants_tier_mat.gz
└── whitelist.txt

0 directories, 7 files

From looking at some bits of the code, the file alevinQC is looking for seems to be most similar to the featureDump.txt file, but headings are a little different. The featureDump.txt file has the following format:

CB      CorrectedReads  MappedReads     DeduplicatedReads       MappingRate     DedupRate       MeanByMax       NumGenesExpressed       NumGenesOverMean
ACTATGAAAAAAACTGCGATCTCGGTT     277519  196726  35294   0.708874        0.820593        0.00953423      6041    1082
GGATTAGGATATAGCCCTAATTCGGCG     320755  234503  42377   0.731097        0.81929 0.00562383      6375    1041
ACACATTGCATCGTTAGCATGAAGCCA     301996  243735  41372   0.80708 0.830258        0.00916166      5934    1104
GTAGCCATCGATAGTGACCACTATTAG     284894  211523  31549   0.742462        0.850848        0.0127815       5877    1116
GTAGCCATCGACGTTCGGCAACAGGCT     313862  240017  35699   0.764721        0.851265        0.0150326       6401    1242
GTAGCCATCGACCGACAACAAGCCTGG     282656  211553  31087   0.748447        0.853053        0.0105161       5785    1115
GTAGCCATCGATGACAGACATCTCACG     276135  203030  31716   0.735256        0.843787        0.0148435       5860    1112
AGACGGATTAAAACGTACAACTGAATT     350865  253253  42605   0.721796        0.831769        0.00804764      6472    1130
GTCAAAGGCGAGTGTGTCCAAGTAATT     269782  200449  30527   0.743004        0.847707        0.0131974       5754    1090

Just in case it helps, I've attached one of my featureDump.txt files to this post.

featureDump.txt.gz

gringer avatar Oct 13 '22 23:10 gringer

Thanks for the report @gringer - the featureDump.txt file looks fine, but alevinQC is looking for a file with the frequency of all cell barcodes (before filtering and quantification). Recent versions of the package don't require this for alevin-fry output, and it seems I need to make the same change for alevin output. Would you mind sharing the command you used to run alevin? Thanks!

csoneson avatar Oct 17 '22 06:10 csoneson

@csoneson Sorry, I missed the notification on this. Here is the run command that I have in my script file:

## map using Salmon with corrected barcodes (all lanes)
salmon alevin -l ISR \
  -1 $(ls demultiplexed/squished_${machineID}*_R1_001.fastq.gz | sort) \
  -2 $(ls demultiplexed/${machineID}*_R2_001.fastq.gz | sort) \
  --mrna ${indexDir}/mt_genes.txt --rrna ${indexDir}/rRNA_genes.txt \
  -i ${indexDir}/${indexName} --expectCells ${expectCellCount} --whitelist goodSampleTagCells.txt \
  -p 10 -o salmon_1.9_cbc_whitelist_${projectID}_combined --tgMap ${indexDir}/txp2gene_${targetName}.txt \
  --umi-geometry '1[28-35]' --bc-geometry '1[1-27]' --read-geometry '2[1-end]'

gringer avatar Dec 02 '22 04:12 gringer

Thanks @gringer - @k3yavi, any idea what might be going on here? I just ran Salmon/alevin 1.9.0 with an external whitelist on some test data, and I still get an alevin/raw_cb_frequency.txt file (both with and without --expectCells).

csoneson avatar Dec 05 '22 17:12 csoneson

Hi,

I think alevin should write the raw_cb_frequency.txt regardless of the whitelist command which @csoneson has already verified. It's unclear what's going on @gringer , are you sure that the salmon finished successfully? Is it possible to share the salmon and alevin logs?

k3yavi avatar Dec 07 '22 13:12 k3yavi

@gringer if everything looks correct, then can you also try adding the --dumpFeatures flag to your command line and check if alevin writes the raw_cb_frequency.txt file?

k3yavi avatar Dec 07 '22 13:12 k3yavi

Yes, thank you, I have a raw_cb_frequency.txt file now, and AlevinQC is happy. Adding the --dumpFeatures command line argument seems to have fixed the problem I was having:

salmon_1.9_cbc_withFeatures_whitelist_OG_2022-Oct-13_combined/
├── alevin
│   ├── alevin.log
│   ├── featureDump.txt
│   ├── quants_mat_cols.txt
│   ├── quants_mat.gz
│   ├── quants_mat_rows.txt
│   ├── quants_tier_mat.gz
│   └── raw_cb_frequency.txt
├── aux_info
│   ├── alevin_meta_info.json
│   ├── ambig_info.tsv
│   ├── expected_bias.gz
│   ├── fld.gz
│   ├── meta_info.json
│   ├── observed_bias_3p.gz
│   └── observed_bias.gz
├── cmd_info.json
├── lib_format_counts.json
├── libParams
│   └── flenDist.txt
└── logs
    └── salmon_quant.log

gringer avatar Dec 12 '22 01:12 gringer

Great! I'll close this issue then as the problem seems to be solved - feel free to reopen if needed.

csoneson avatar Dec 12 '22 07:12 csoneson