demuxlet
demuxlet copied to clipboard
QC of deconvoluted SNG
I've run 2 big experiments with demultiplexing so far. Both of them have unsatisfying SNG numbers compared to the expected numbers from the publication and for sure less than 90% of the total GEMS. my first experiment (8 samples pool) had 25.000 reads per cell and was a 3' sequencing 10x v2, the second (20 samples pool) has 50.000 reads per cell and is a 5' sequencing 10x v2. These differences result in higher number of UMi per cell in the second instance, but the number of SNP-per-cell distribution is still similar. hence I'm throwing away a lot of sequencing in both cases.
the genotype data for bot was produced using the Illumina infinium global screening array. I deliberately generated a VCF file with only the GT information to test the demultiplexing.
- Demuxlet is calibrated on 10x chemistry v1. What exactly is affected by that in the mixture model?
- I noticed that tweaking --geno-error from the default 0.01 to 0.05 increases dramatically the SNG over the DBL in the 5' seq experiment, but not in the 3' seq experiment (tested geno error 0.0001, 0.001, 0.05 and 0.2) how does demuxlet calculate the geno-error from the GT info? (from the methods of the paper P_sv (g) = Pr(g |Data_sv ))
- How many of the SNG would be also a "good cell" from the 10x pipeline (i.e. the upper part of the knee plot in 10x report)? In my data the SNG and 10xPassFilter don't match.
- How many SNG are actually good QC cells (%mitocondria, nUMI, complexity, # reads per cell...) in your data - is this not mentioned in the paper?
many thanks
hi bio-la, do you mind posting the output here? we are working on calibrating demuxlet for 10X V2 (3' and 5') chemistry and hopefully will have an update soon. your results could be very helpful for us.
Which of the outputs would be more useful? logs, .best files... ?
let’s start with the .best files.
~j
On Feb 6, 2018, at 1:54 AM, biola [email protected] wrote:
Which of the outputs would be more useful? logs, .best files... ?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/statgen/demuxlet/issues/10#issuecomment-363369947, or mute the thread https://github.com/notifications/unsubscribe-auth/AEUMf81imqEUdwhN_uYe7Fj5WZUMSBcCks5tSCFlgaJpZM4R55cu.
Hi jimmie, I created a repo (dem_best) with the files. In the meantime, can you at least comment on my questions please? I'd like to troubleshoot my experiments and understand if there's something I could modify in the input .bam and .vcf . thanks.
Hi, Any update on this...? Thanks.
have you tried running with --alpha 0 --alpha 0.5
?
Hi Jimmie, yes, but the only sensible improvement comes with alpha 0 and geno-error 0,001 - using the barcodes.tsv file. Doesn't alpha 0 mean I'm not expecting doublets? (Indeed I just get AMB and SNG but I'm not really liking the assumption ) Anyway, I just get 67% of SNG over 15k total cells but I reckon there isn't much more I can try as this depends on the low no of SNPs covered and chemistry version?
Alpha 0 and 0.5 will be added automatically even if you do not specify. Maybe I should allow the case not allowing any doublets.. we will discuss and make the necessary changes.
Thanks, Hyun.
On Tue, Mar 20, 2018 at 6:43 PM biola [email protected] wrote:
Hi Jimmie, yes, but the only sensible improvement comes with alpha 0 and geno-error 0,001 - using the barcodes.tsv file. Doesn't alpha 0 mean I'm not expecting doublets? (Indeed I just get AMB and SNG but I'm not really liking the assumption ) Anyway, I just get 67% of SNG over 15k total cells but I reckon there isn't much more I can try as this depends on the low no of SNPs covered and chemistry version?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/statgen/demuxlet/issues/10#issuecomment-374534342, or mute the thread https://github.com/notifications/unsubscribe-auth/AF-OufqFgXcsuqDjytPNonpyoCOLKxm9ks5tgM8mgaJpZM4R55cu .
@hyunminkang sorry for the terrible delay, but what do you mean by "Alpha 0 and 0.5 will be added automatically even if you do not specify." - I know that if you don't specify alpha (the default) it will use the array 0,.1,.2,.3,.4,.5 - but the default configuration didn't give me back many usable Singlets (SNG), hence jimmie suggested that I run the pipeline with either alpha 0 or 0.5 cause the default wouldn't work as well. I tried all the levels in between specifying them one at the time, and I get the highest number of SNG only with alpha 0 - I don't have DBL with this trick (which as I said I don't like), nor likelihoods, as in https://github.com/statgen/demuxlet/issues/20 .
I've just changed the default to be 0 and 0.5, so should be clearer now.
Dear @yimmieg, @hyunminkang,
These are good questions. Have you managed to answer Q1 and Q2?
Thank you, Ciro