souporcell icon indicating copy to clipboard operation
souporcell copied to clipboard

Troubleshooting results

Open chris-mcginnis-ucsf opened this issue 5 years ago • 6 comments

Hello again!

This is not a usage/implementation question as before but, rather, a request for troubleshooting advice.

So, we recently performed an experiment where we used MULTI-seq to multiplex primary tissue from 13 different donors. The sample barcode data looks decent, but as we sometimes see with difficult-to-dissociate tissue, some cells were unable to be successfully demultiplexed. Moreover, when applying a different sample classification workflow to the data, we get slightly different results. I wanted to get a 'ground-truth' to decide which classification set to proceed with, so I applied souporcell to the data:

Screen Shot 2019-08-16 at 11 36 20 AM

This is barcode space, with cells colored according to their demultiplexing results. Notably, while there are some cells that are doublets or cannot be classified (black), clusters are generally 'pure' and I can find all 13 donors with the deMULTIplex and demuxEM results. However, souporcell is erroneously calling three of the clusters as coming from the same donor (we can infer from gene expression analysis that these are indeed different donors). Notably, the colors are different because of mapping issue @achamess was talking about.

Do you have any insight into what could cause this result? What information would be useful for you to help me troubleshoot?

Here's the command I used to run souporcell:

singularity exec souporcell.sif souporcell_pipeline.py -i ./possorted_genome_bam.bam -b ./cellIDs.tsv -f hg19_3.0.0_genome.fa -t 16 -o LIVE_OLD_souporcell -k 13

Chris

chris-mcginnis-ucsf avatar Aug 16 '19 18:08 chris-mcginnis-ucsf

Hi Chris,

Thanks for this. So I am not familiar with deMULTIplex. demuxEM, as I understand it is a antibody based cell hashing method. Could you point me at deMULTIplex? And beyond that I have a few questions.

  1. Are these nuclei or cells?
  2. How many median UMI/cell?
  3. Could you clarify this statement "Notably, the colors are different because of mapping issue @achamess was talking about." I don't understand what this means. Oh perhaps you mean that we don't know which cluster is which donor. In that case, I understand. Let me know if this is correct.
  4. What do you mean by "barcode space?" Is this the antibody cell hashing barcode or something else?

So we haven't tested souporcell beyond 8 real samples at once mixed because I didn't have such a dataset. It will eventually run into local minima causing this type of thing, but with good signal to noise I would hope it would be beyond this number (I have tested up to 32 simulated samples mixed working fairly well but not perfectly). I have several ideas on how to make souporcell better at clustering a higher number of mixed samples. And I'm sure you have had issues with memory and runtime with this number of samples and cells and I have plans for improving those as well. I appreciate you bringing these things to my attention, and I am confident with some feedback from you I can extend souporcell to work on significantly more samples.

Best, Haynes

wheaton5 avatar Aug 16 '19 21:08 wheaton5

The quickest solution would be to increase the number of restarts and use common human variants from the 1k genomes project. I have the common variants option in the current singularity container. I have added and tested the additional restarts, but neglected to include it in the latest version. I will update that and post here again when done. Still, I will reiterate that I have several other ideas on how to improve this as well as memory and runtime, but they will take a bit longer to implement.

Thanks for your patience and feedback. You have been an excellent early user.

Best, Haynes

wheaton5 avatar Aug 16 '19 21:08 wheaton5

New container here (and on the readme) with optional setting of # of restarts and common variants:

wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1lcCsTAVh2y72UEFnG1ALUWhRVoS7CqB9' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1lcCsTAVh2y72UEFnG1ALUWhRVoS7CqB9" -O souporcell.sif && rm -rf /tmp/cookies.txt

I would try with 30 restarts and common variants (assuming this is human). I am gonna be on vacation for a bit, but will then look to provide several new features to improve memory, speed, and accuracy. I have come to a better understanding of this problem now than I have before and I am confident that it can be even better. Thanks again for your patience.

wheaton5 avatar Aug 16 '19 22:08 wheaton5

Thanks for the quick response.

deMULTIplex and demuxEM are two pipelines for demultiplexing sample-multiplexed scRNA-seq data. demuxEM was developed for demultiplexing nuclei hashing data (antibodies) and deMULTIplex was developed for demultiplexing MULTIseq data (lipid-modified oligonucleotides). They both take as an input a sample barcode UMI count matrix. Here's the link for deMULTIplex if you want to learn more: https://github.com/chris-mcginnis-ucsf/MULTI-seq

To answer your other questions

  1. These are cells.
  2. Do not have this information right now, but will pass along soon.
  3. I just mean that demuxEM and deMULTIplex labels the classification results by donor, while the souporcell results were just numbered 0-12.
  4. By barcode space, I mean an embedding generated from the sample barcode UMI count matrix, instead of a gene expression count matrix.

And thanks for the advice! I'll see if boosting the number of restarts and using common variants helps.

Chris

chris-mcginnis-ucsf avatar Aug 16 '19 23:08 chris-mcginnis-ucsf

--restarts 30 (or whatever you please)

wheaton5 avatar Aug 17 '19 01:08 wheaton5

Re: Here's the link for deMULTIplex if you want to learn more: https://github.com/chris-mcginnis-ucsf/MULTI-seq

Ah yes, I did see this. Good to put a username to a good paper lol (sorry, im terrible with names). Nice work. I will take a closer look at this soon, but 2am here so I should be zzzzzzz...

Best, Haynes

wheaton5 avatar Aug 17 '19 01:08 wheaton5