Contributions
Contributions copied to clipboard
demuxmix
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
- Repository: https://github.com/huklein/demuxmix
Confirm the following by editing each check box to '[x]'
-
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
-
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
-
[x] I understand Bioconductor Package Naming Policy and acknowledge Bioconductor may retain use of package name.
-
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
-
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
-
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
-
[x] I am familiar with the Bioconductor code of conduct and agree to abide by it.
I am familiar with the essential aspects of Bioconductor software management, including:
- [x] The 'devel' branch for new packages and features.
- [x] The stable 'release' branch, made available every six months, for bug fixes.
- [x] Bioconductor version control using Git (optionally via GitHub).
For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.
Hi @huklein
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: demuxmix
Title: Demultiplexing oligo-barcoded scRNA-seq data using regression mixture models
Version: 0.99.0
Date: 2022-07-19
Description: A package for demultiplexing single-cell sequencing experiments
of pooled cells that were labeled with barcode oligonucleotides. The
package provides methods to fit mixture models for a probabilistic
classification of the cells and to assess the quality and error rates of the
demultiplexing.
Authors@R: person("Hans-Ulrich", "Klein",
email = "[email protected]",
role = c("aut", "cre"),
comment = c(ORCID = "0000-0002-6382-9428"))
License: Artistic-2.0
Depends: R (>= 3.5)
Imports: stats, MASS, Matrix, ggplot2, gridExtra, methods
Suggests: BiocStyle, knitr, rmarkdown, cowplot
VignetteBuilder: knitr
biocViews: SingleCell, Preprocessing, Classification
BiocType: Software
URL: https://github.com/huklein/demuxmix
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.2.0
A reviewer has been assigned to your package. Learn what to expect during the review process.
IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.
Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/demuxmix to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 7ede439d2c3b6c6caa2e9073f90e9ae86392b435
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/demuxmix to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: e6f1f9ae8c73d878648cdf8fc020a5d2af064f3a
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/demuxmix to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: a3897aac8fe0f942ac70470edfb44ebd35c423ca
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/demuxmix to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
@huklein I'm sorry for the delay in reviewing demuxmix. I've started reviewing the package today and should have a review for you later this week.
Hi @huklein,
Thank you for submitting demuxmix to Bioconductor. The package is already in great shape and provides an interesting solution for a key step of contemporary scRNA-seq analysis workflows.
For acceptance into Bioconductor, there are a number of Required points, as well as Recommended points, that I would ask you to first please address. Would you please provide line-by-line comments to my initial review so that I know what changes I'm looking for in my re-review.
Cheers, Pete
Required
- [ ] Using the Bioconductor build system, have you tried building the package with the 'real data' example evaluated and included in the vignette? Did it pass the Bioconductor build system or timeout? I think its inclusion would make a great addition to the vignette.
- [ ] In the vignette, please add a comparison to
DropletUtils::hashedDrops(). This is an (the?) existing Bioconductor-based solution to demultiplexing HTOs. In particular, it is the featured solution in the 'Demultiplexing cell hashes' section of 'Orchestrating Single Cell Analysis with Bioconductor' book. Something along the lines of the following may be suitable, with some discussion and comparison of the results:
suppressPackageStartupMessages(library(demuxmix))
suppressPackageStartupMessages(library(scRNAseq))
suppressPackageStartupMessages(library(DropletUtils))
suppressPackageStartupMessages(library(pheatmap))
# Prepare data -----------------------------------------------------------------
htoExp <- StoeckiusHashingData(type = "mixed")
#> snapshotDate(): 2022-07-22
#> see ?scRNAseq and browseVignettes('scRNAseq') for documentation
#> loading from cache
#> snapshotDate(): 2022-07-22
#> see ?scRNAseq and browseVignettes('scRNAseq') for documentation
#> loading from cache
set.seed(666)
eDrops <- emptyDrops(counts(htoExp))
htoExp <- htoExp[, which(eDrops$FDR <= 0.001)]
rna <- colSums(counts(htoExp) > 0)
hto <- assay(altExp(htoExp))
# demuxmix() -------------------------------------------------------------------
dmm <- demuxmix(hto, rna = rna)
classLabels <- dmmClassify(dmm)
# hashedDrops() ----------------------------------------------------------------
hash.stats <- hashedDrops(hto)
# Comparison -------------------------------------------------------------------
demuxmix <- ifelse(
classLabels$Type == "singlet", classLabels$HTO, classLabels$Type)
hashedDrops <- ifelse(
hash.stats$Confident,
rownames(hto)[hash.stats$Best],
ifelse(hash.stats$Doublet, "doublet", "unknown"))
addmargins(table(demuxmix, hashedDrops), margin = c(1, 2))
#> hashedDrops
#> demuxmix doublet HEK_A HEK_B HEK_C K562_A K562_B K562_C KG1_A KG1_B KG1_C
#> HEK_A 1 569 0 0 0 0 0 0 0 0
#> HEK_B 1 0 611 0 0 0 0 0 0 0
#> HEK_C 5 0 0 545 0 0 0 0 0 0
#> K562_A 0 0 0 0 377 0 0 0 0 0
#> K562_B 1 0 0 0 0 594 0 0 0 0
#> K562_C 2 0 0 0 0 0 424 0 0 0
#> KG1_A 3 0 0 0 0 0 0 620 0 0
#> KG1_B 0 0 0 0 0 0 0 0 538 0
#> KG1_C 1 0 0 0 0 0 0 0 0 601
#> multiplet 378 11 8 15 1 11 3 17 12 3
#> negative 0 0 0 0 0 0 0 3 7 4
#> THP1_A 3 0 0 0 0 0 0 0 0 0
#> THP1_B 0 0 0 0 0 0 0 0 0 0
#> THP1_C 1 0 0 0 0 0 0 0 0 0
#> uncertain 0 0 0 0 1 0 0 0 0 0
#> Sum 396 580 619 560 379 605 427 640 557 608
#> hashedDrops
#> demuxmix THP1_A THP1_B THP1_C unknown Sum
#> HEK_A 0 0 0 31 601
#> HEK_B 0 0 0 69 681
#> HEK_C 0 0 0 13 563
#> K562_A 0 0 0 156 533
#> K562_B 0 0 0 20 615
#> K562_C 0 0 0 140 566
#> KG1_A 0 0 0 23 646
#> KG1_B 0 0 0 33 571
#> KG1_C 0 0 0 69 671
#> multiplet 18 15 13 70 575
#> negative 0 0 0 245 259
#> THP1_A 505 0 0 3 511
#> THP1_B 0 539 0 23 562
#> THP1_C 0 0 560 40 601
#> uncertain 0 0 0 0 1
#> Sum 523 554 573 935 7956
pheatmap(
table(demuxmix, hashedDrops),
color = viridisLite::inferno(101),
scale = "none",
cluster_rows = TRUE,
cluster_cols = TRUE)

- [ ] Please add a
BugReportsfield to theDESCRIPTION. This usually links to the package's GitHub issues page. - [ ] Please add an 'Installation' section to the vignette. Typically, this section would also go in the
README.md(if using).
Recommended
- [ ] Perhaps have a final proofread and run a spellcheck over the vignette and other documentation (e.g., 'divers' should be 'diverse' in vignette).
- [ ] Consider adding captions to vignette figures.
- [ ] In the vignette, please
set.seed()beforeemptyDrops()(and prior to any other functions that rely on simulation) to ensure reproducibility of results. - [ ] A more computationally efficient method to get the number of expressed genes per droplet is to use
colSums(counts(htoExp) > 0); see reprex below.
suppressPackageStartupMessages(library(demuxmix))
suppressPackageStartupMessages(library(scRNAseq))
suppressPackageStartupMessages(library(DropletUtils))
htoExp <- StoeckiusHashingData(type = "mixed")
#> snapshotDate(): 2022-07-22
#> see ?scRNAseq and browseVignettes('scRNAseq') for documentation
#> loading from cache
#> snapshotDate(): 2022-07-22
#> see ?scRNAseq and browseVignettes('scRNAseq') for documentation
#> loading from cache
eDrops <- emptyDrops(counts(htoExp))
htoExp <- htoExp[, which(eDrops$FDR <= 0.001)]
system.time(a <- apply(assay(htoExp) > 0, 2, sum))
#> user system elapsed
#> 7.947 2.496 16.945
system.time(b <- colSums(counts(htoExp) > 0))
#> user system elapsed
#> 0.291 0.134 0.576
identical(a, b)
#> [1] TRUE
- [ ] When working through the vignette, I opted to use the 'real data' example instead of the simulated data example. In doing so, I found that the output of
plotDmmHistogram(dmm)applied to the real data looked rather different to that when applied to the simulated data (see below). Perhaps providing some commentary on interpreting the real data example would be helpful?
suppressPackageStartupMessages(library(demuxmix))
suppressPackageStartupMessages(library(scRNAseq))
suppressPackageStartupMessages(library(DropletUtils))
# Simulated data ---------------------------------------------------------------
set.seed(563425)
class <- rbind(
c(rep( TRUE, 200), rep(FALSE, 200), rep(FALSE, 200), rep( TRUE, 50)),
c(rep(FALSE, 200), rep( TRUE, 200), rep(FALSE, 200), rep( TRUE, 50)),
c(rep(FALSE, 200), rep(FALSE, 200), rep( TRUE, 200), rep(FALSE, 50)))
simdata <- dmmSimulateHto(class)
hto <- simdata$hto
rna <- simdata$rna
dmm <- demuxmix(hto, rna = rna)
plotDmmHistogram(dmm)

# Real data -------------------------------------------------------------------
htoExp <- StoeckiusHashingData(type = "mixed")
#> snapshotDate(): 2022-07-22
#> see ?scRNAseq and browseVignettes('scRNAseq') for documentation
#> loading from cache
#> snapshotDate(): 2022-07-22
#> see ?scRNAseq and browseVignettes('scRNAseq') for documentation
#> loading from cache
eDrops <- emptyDrops(counts(htoExp))
htoExp <- htoExp[, which(eDrops$FDR <= 0.001)]
rna <- colSums(counts(htoExp) > 0)
hto <- assay(altExp(htoExp))
dmm <- demuxmix(hto, rna = rna)
plotDmmHistogram(dmm)

Created on 2022-08-10 by the reprex package (v2.0.1)
- [ ] From the vignette, running
dmmOverlap(dmm) < 0.03on the real data suggests some bad or poor performing HTOs (KG1_BandKG1_C), correct? I'm really just clarifying I've correctly understood this function and its output.
#> dmmOverlap(dmm) < 0.03
HEK_A HEK_B HEK_C THP1_A THP1_B THP1_C K562_A K562_B K562_C KG1_A KG1_B KG1_C
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
- [ ] The vignette section 'Special usecase: pooling unlabeled with labeled cells' describes an interesting experimental design that I'd not encountered. I wondered if demuxmix could be extended to demultiplex some other niche(?) designs I've seen in my work:
- Label each sample with a unique combination of k > 1 HTOs. This design can be analysed using the
combinationsargument ofDropletUtils::hashedDrops(). - Label n samples each with 1 unique HTO and the n+1 sample with all n HTOs. I'm not aware of an existing method designed for demultiplexing this design.
- Label each sample with a unique combination of k > 1 HTOs. This design can be analysed using the
- [ ] Personally, I found reading the source code a little trick at time due to the formatting. Running
BiocCheck::BiocCheck()on the package points to some of the issues and potential solutions, e.g.,- Consider shorter line lengths to improve readibility of source code.
- Consider running styler to get a more consistent formatting of source code.
- [ ] FYI when constructing a matrix, you can specify row and column names in the constructor via the
dimnamesargument rather than setting them usingrownames<-()/colnames<-()afterwards. - [ ] Consider adding a top-level
README.md. - [ ] For consistency with other functions in demuxmix, consider changing
p.acpt()/p.acpt<-()to usesnakeCase(). - [ ] Recommend adding a
inst/CITATIONfile. - [ ] Consider adding a package-level man page.
- [ ] Thank you for adding unit tests. Please take a look at
covr::report()to identify lines that aren't being tested and perhaps aim to ensure that critical funcionality is well-tested.
Hi @PeteHaitch ,
Thanks so much for the detailed review! I will need a couple of days to implement the improvements. Before I start, I have one question regarding the vignette. I agree that the real dataset is a better use case for the vignette. There are two reasons why I still decided to use the simulated data. One argument was the runtime, which you addressed. My other thought was that I did not want to force users to download the relatively large dataset just to install my package. I assume the dataset wouldn't be downloaded on the BioC build system because of caching. Still, it would further slow down the installation process for users and trigger the download of a larger dataset. I am happy to rewrite the vignette, including the comparison with hashedDrops (nice idea!). I just want to ensure it is BioC policy to prioritize more realistic vignettes over lean packages with little network/disk space usage. I wish there were another option to add this use case without automatically running it at the installation.
Best, Hans
My understanding is that the data used in the vignette will only be downloaded from ExperimentHub (via scRNAseq) if a user is installing the package from source (and building/installing the vignette as part of that process). Generally, only users on Linux will be installing from source. Users on macOS and Windows (the majority of users) will almost always be installing the binary packages that Bioconductor makes available on those platforms. The binary installation includes the vignette in its rendered form (i.e. HTML in your case), so the user won't have to download the dataset simply to read the vignette. Of course, if they want to run the code in the vignette themselves then they will have to download the data, but that's a one-time cost due to the caching functionality implemented in ExperimentHub.
I'll ask someone from @Bioconductor/core to confirm my understanding is correct and to comment on their preference re the tradeoff of "... a more realistic vignettes [vs.] lean packages with little network/disk space usage". Since the example in your vignette is great at re-using existing BioC data, I would hope that tips the balance towards supporting a more realistic vignette.
Received a valid push on git.bioconductor.org; starting a build for commit id: ab716dee9c8be4dc06abd1ca32937080a92c336c
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/demuxmix to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 4238d7cca746ee083eec7f3d30486c727af20b5f
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/demuxmix to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi @huklein,
Thank you for submitting demuxmix to Bioconductor. The package is already in great shape and provides an interesting solution for a key step of contemporary scRNA-seq analysis workflows.
For acceptance into Bioconductor, there are a number of Required points, as well as Recommended points, that I would ask you to first please address. Would you please provide line-by-line comments to my initial review so that I know what changes I'm looking for in my re-review.
Cheers, Pete
Required
- [ ] Using the Bioconductor build system, have you tried building the package with the 'real data' example evaluated and included in the vignette? Did it pass the Bioconductor build system or timeout? I think its inclusion would make a great addition to the vignette.
- [ ] In the vignette, please add a comparison to
DropletUtils::hashedDrops(). This is an (the?) existing Bioconductor-based solution to demultiplexing HTOs. In particular, it is the featured solution in the 'Demultiplexing cell hashes' section of 'Orchestrating Single Cell Analysis with Bioconductor' book. Something along the lines of the following may be suitable, with some discussion and comparison of the results:suppressPackageStartupMessages(library(demuxmix)) suppressPackageStartupMessages(library(scRNAseq)) suppressPackageStartupMessages(library(DropletUtils)) suppressPackageStartupMessages(library(pheatmap)) # Prepare data ----------------------------------------------------------------- htoExp <- StoeckiusHashingData(type = "mixed") #> snapshotDate(): 2022-07-22 #> see ?scRNAseq and browseVignettes('scRNAseq') for documentation #> loading from cache #> snapshotDate(): 2022-07-22 #> see ?scRNAseq and browseVignettes('scRNAseq') for documentation #> loading from cache set.seed(666) eDrops <- emptyDrops(counts(htoExp)) htoExp <- htoExp[, which(eDrops$FDR <= 0.001)] rna <- colSums(counts(htoExp) > 0) hto <- assay(altExp(htoExp)) # demuxmix() ------------------------------------------------------------------- dmm <- demuxmix(hto, rna = rna) classLabels <- dmmClassify(dmm) # hashedDrops() ---------------------------------------------------------------- hash.stats <- hashedDrops(hto) # Comparison ------------------------------------------------------------------- demuxmix <- ifelse( classLabels$Type == "singlet", classLabels$HTO, classLabels$Type) hashedDrops <- ifelse( hash.stats$Confident, rownames(hto)[hash.stats$Best], ifelse(hash.stats$Doublet, "doublet", "unknown")) addmargins(table(demuxmix, hashedDrops), margin = c(1, 2)) #> hashedDrops #> demuxmix doublet HEK_A HEK_B HEK_C K562_A K562_B K562_C KG1_A KG1_B KG1_C #> HEK_A 1 569 0 0 0 0 0 0 0 0 #> HEK_B 1 0 611 0 0 0 0 0 0 0 #> HEK_C 5 0 0 545 0 0 0 0 0 0 #> K562_A 0 0 0 0 377 0 0 0 0 0 #> K562_B 1 0 0 0 0 594 0 0 0 0 #> K562_C 2 0 0 0 0 0 424 0 0 0 #> KG1_A 3 0 0 0 0 0 0 620 0 0 #> KG1_B 0 0 0 0 0 0 0 0 538 0 #> KG1_C 1 0 0 0 0 0 0 0 0 601 #> multiplet 378 11 8 15 1 11 3 17 12 3 #> negative 0 0 0 0 0 0 0 3 7 4 #> THP1_A 3 0 0 0 0 0 0 0 0 0 #> THP1_B 0 0 0 0 0 0 0 0 0 0 #> THP1_C 1 0 0 0 0 0 0 0 0 0 #> uncertain 0 0 0 0 1 0 0 0 0 0 #> Sum 396 580 619 560 379 605 427 640 557 608 #> hashedDrops #> demuxmix THP1_A THP1_B THP1_C unknown Sum #> HEK_A 0 0 0 31 601 #> HEK_B 0 0 0 69 681 #> HEK_C 0 0 0 13 563 #> K562_A 0 0 0 156 533 #> K562_B 0 0 0 20 615 #> K562_C 0 0 0 140 566 #> KG1_A 0 0 0 23 646 #> KG1_B 0 0 0 33 571 #> KG1_C 0 0 0 69 671 #> multiplet 18 15 13 70 575 #> negative 0 0 0 245 259 #> THP1_A 505 0 0 3 511 #> THP1_B 0 539 0 23 562 #> THP1_C 0 0 560 40 601 #> uncertain 0 0 0 0 1 #> Sum 523 554 573 935 7956 pheatmap( table(demuxmix, hashedDrops), color = viridisLite::inferno(101), scale = "none", cluster_rows = TRUE, cluster_cols = TRUE)
- [ ] Please add a
BugReportsfield to theDESCRIPTION. This usually links to the package's GitHub issues page.- [ ] Please add an 'Installation' section to the vignette. Typically, this section would also go in the
README.md(if using).Recommended
- [ ] Perhaps have a final proofread and run a spellcheck over the vignette and other documentation (e.g., 'divers' should be 'diverse' in vignette).
- [ ] Consider adding captions to vignette figures.
- [ ] In the vignette, please
set.seed()beforeemptyDrops()(and prior to any other functions that rely on simulation) to ensure reproducibility of results.- [ ] A more computationally efficient method to get the number of expressed genes per droplet is to use
colSums(counts(htoExp) > 0); see reprex below.suppressPackageStartupMessages(library(demuxmix)) suppressPackageStartupMessages(library(scRNAseq)) suppressPackageStartupMessages(library(DropletUtils)) htoExp <- StoeckiusHashingData(type = "mixed") #> snapshotDate(): 2022-07-22 #> see ?scRNAseq and browseVignettes('scRNAseq') for documentation #> loading from cache #> snapshotDate(): 2022-07-22 #> see ?scRNAseq and browseVignettes('scRNAseq') for documentation #> loading from cache eDrops <- emptyDrops(counts(htoExp)) htoExp <- htoExp[, which(eDrops$FDR <= 0.001)] system.time(a <- apply(assay(htoExp) > 0, 2, sum)) #> user system elapsed #> 7.947 2.496 16.945 system.time(b <- colSums(counts(htoExp) > 0)) #> user system elapsed #> 0.291 0.134 0.576 identical(a, b) #> [1] TRUE
- [ ] When working through the vignette, I opted to use the 'real data' example instead of the simulated data example. In doing so, I found that the output of
plotDmmHistogram(dmm)applied to the real data looked rather different to that when applied to the simulated data (see below). Perhaps providing some commentary on interpreting the real data example would be helpful?suppressPackageStartupMessages(library(demuxmix)) suppressPackageStartupMessages(library(scRNAseq)) suppressPackageStartupMessages(library(DropletUtils)) # Simulated data --------------------------------------------------------------- set.seed(563425) class <- rbind( c(rep( TRUE, 200), rep(FALSE, 200), rep(FALSE, 200), rep( TRUE, 50)), c(rep(FALSE, 200), rep( TRUE, 200), rep(FALSE, 200), rep( TRUE, 50)), c(rep(FALSE, 200), rep(FALSE, 200), rep( TRUE, 200), rep(FALSE, 50))) simdata <- dmmSimulateHto(class) hto <- simdata$hto rna <- simdata$rna dmm <- demuxmix(hto, rna = rna) plotDmmHistogram(dmm)
# Real data ------------------------------------------------------------------- htoExp <- StoeckiusHashingData(type = "mixed") #> snapshotDate(): 2022-07-22 #> see ?scRNAseq and browseVignettes('scRNAseq') for documentation #> loading from cache #> snapshotDate(): 2022-07-22 #> see ?scRNAseq and browseVignettes('scRNAseq') for documentation #> loading from cache eDrops <- emptyDrops(counts(htoExp)) htoExp <- htoExp[, which(eDrops$FDR <= 0.001)] rna <- colSums(counts(htoExp) > 0) hto <- assay(altExp(htoExp)) dmm <- demuxmix(hto, rna = rna) plotDmmHistogram(dmm)
Created on 2022-08-10 by the reprex package (v2.0.1)
- [ ] From the vignette, running
dmmOverlap(dmm) < 0.03on the real data suggests some bad or poor performing HTOs (KG1_BandKG1_C), correct? I'm really just clarifying I've correctly understood this function and its output.#> dmmOverlap(dmm) < 0.03 HEK_A HEK_B HEK_C THP1_A THP1_B THP1_C K562_A K562_B K562_C KG1_A KG1_B KG1_C TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
[ ] The vignette section 'Special usecase: pooling unlabeled with labeled cells' describes an interesting experimental design that I'd not encountered. I wondered if demuxmix could be extended to demultiplex some other niche(?) designs I've seen in my work:
- Label each sample with a unique combination of k > 1 HTOs. This design can be analysed using the
combinationsargument ofDropletUtils::hashedDrops().- Label n samples each with 1 unique HTO and the n+1 sample with all n HTOs. I'm not aware of an existing method designed for demultiplexing this design.
[ ] Personally, I found reading the source code a little trick at time due to the formatting. Running
BiocCheck::BiocCheck()on the package points to some of the issues and potential solutions, e.g.,
- Consider shorter line lengths to improve readibility of source code.
- Consider running styler to get a more consistent formatting of source code.
[ ] FYI when constructing a matrix, you can specify row and column names in the constructor via the
dimnamesargument rather than setting them usingrownames<-()/colnames<-()afterwards.[ ] Consider adding a top-level
README.md.[ ] For consistency with other functions in demuxmix, consider changing
p.acpt()/p.acpt<-()to usesnakeCase().[ ] Recommend adding a
inst/CITATIONfile.[ ] Consider adding a package-level man page.
[ ] Thank you for adding unit tests. Please take a look at
covr::report()to identify lines that aren't being tested and perhaps aim to ensure that critical funcionality is well-tested.
Dear @huklein ,
We have reopened the issue to continue the review process. Please remember to push a version bump to git.bioconductor.org to trigger a new build.
Hi @PeteHaitch,
Thanks again for the helpful feedback! Please see my response to the comments below.
Required
- [ ] Using the Bioconductor build system, have you tried building the package with the 'real data' example evaluated and included in the vignette? Did it pass the Bioconductor build system or timeout? I think its inclusion would make a great addition to the vignette.
I was thinking about this point a lot. Building the vignette with the real data takes about 2min on my computer. The package shouldn't time out on the build system. In the end, I still prefer to stick with the simulated data as the main dataset in the vignette for the following reasons:
- Resources: As a Linux user, I might be more sensitive to long package building times than others, which I experience after every new Bioconductor release. More annoying is that I would have to download and save a relatively large dataset on my PC and on the cluster I work on, even though the dataset would never be used on the latter. I looked at other packages like DropletUtils, and as far as I could see, they use small simulated datasets, too.
- Dataset design: The cell line mixture dataset might be better than a simulated dataset, but it is not ideal for showcasing demultiplexing either. The number of samples is untypically large (12 pooled samples). We usually don't pool more than four. Further, the cells are very homogenous within samples (same cell line) and very heterogenous between samples, which is uncommon in most "real" datasets. Thus, it is not a "typical" dataset.
- Improved simulated data: I tweaked the simulation a bit to make the simulated data more similar to the real data. Specifically, the third HTO has a lower quality now, similar to the last HTO in the cell line mixture dataset.
I revised the vignette and believe it is very informative now. There is still a small real dataset (CSF dataset) included, and it is still fairly easy to walk through the vignette using the cell line mixture instead of the simulated dataset if a user wants to. Please take a look and let me know what you think. It could quickly switch to the cell line mixture dataset; I am just not convinced it is the best course of action.
- [ ] In the vignette, please add a comparison to
DropletUtils::hashedDrops(). This is an (the?) existing Bioconductor-based solution to demultiplexing HTOs. In particular, it is the featured solution in the 'Demultiplexing cell hashes' section of 'Orchestrating Single Cell Analysis with Bioconductor' book. Something along the lines of the following may be suitable, with some discussion and comparison of the results [...]
Thanks for the suggestion and the code example! I have added a comparison to hashedDrops(). Yes, I believe hashedDrops() is currently the only method for demultiplexing in Bioconductor. There are other R packages, though.
- [ ] Please add a
BugReportsfield to theDESCRIPTION. This usually links to the package's GitHub issues page.
Done.
- [ ] Please add an 'Installation' section to the vignette. Typically, this section would also go in the
README.md(if using).
Done.
Recommended
- [ ] Perhaps have a final proofread and run a spellcheck over the vignette and other documentation (e.g., 'divers' should be 'diverse' in vignette).
I re-wrote some sections for better readability and did final proofreading; I hope I found most typos.
- [ ] Consider adding captions to vignette figures.
Done.
- [ ] In the vignette, please
set.seed()beforeemptyDrops()(and prior to any other functions that rely on simulation) to ensure reproducibility of results.
Done.
- [ ] A more computationally efficient method to get the number of expressed genes per droplet is to use
colSums(counts(htoExp) > 0); see reprex below [...]
Thanks! I didn't know this. I replaced apply() with colSums() and rowSums() whenever possible throughout the package.
- [ ] When working through the vignette, I opted to use the 'real data' example instead of the simulated data example. In doing so, I found that the output of
plotDmmHistogram(dmm)applied to the real data looked rather different to that when applied to the simulated data (see below). Perhaps providing some commentary on interpreting the real data example would be helpful?
This is a good point. Unfortunately, the histograms can look quite different between real (high-quality) datasets. I added a paragraph explaining the reasons and describing what users should look for. I also added a code block showing how to zoom into the critical part of the histogram.
- [ ] From the vignette, running
dmmOverlap(dmm) < 0.03on the real data suggests some bad or poor performing HTOs (KG1_BandKG1_C), correct? I'm really just clarifying I've correctly understood this function and its output.
Thanks for the feedback. I downtoned the statement in the revised vignette. From my experience, an overlap < 0.03 suggests high-quality HTO data and will usually result in very few "uncertain" droplets. Of course, it is just a rule of thumb, and I have seen and used datasets with larger overlaps. When I see values exceeding 0.03, I usually give feedback to the wet lab, and we tried to optimize antibody concentrations, washing, etc., but the dataset is still useable (after removing the uncertain droplets). I am usually concerned about values above 0.1. However, such a low overlap (<0.03) may be hard to achieve if working with low-quality tissue. Regarding the cell line mixture dataset, I think the last HTO (HG1_C) is not of optimal quality.
[ ] The vignette section 'Special usecase: pooling unlabeled with labeled cells' describes an interesting experimental design that I'd not encountered. I wondered if demuxmix could be extended to demultiplex some other niche(?) designs I've seen in my work:
- Label each sample with a unique combination of k > 1 HTOs. This design can be analysed using the
combinationsargument ofDropletUtils::hashedDrops().- Label n samples each with 1 unique HTO and the n+1 sample with all n HTOs. I'm not aware of an existing method designed for demultiplexing this design.
Interesting. I looked into the combinations argument of hashedDrops(). This could be implemented into the dmmClassify() and summary() methods without any changes to the actual demultiplexing method. The main difference would be the output of summary(), which currently assumes each droplet with more than one HTO is a multiplet. However, demumix can already be used for these special designs: The first column of the data frame returned by dmmClassify() specifies the exact type of multiplet detected, and one can extract any combination of HTOs of interest. (demuxmix simply returns the most likely class - whether it is a singlet or a multiplet.) Therefore, I have not added an additional argument (like combinations), but if these special designs are more common than I think they are, I will add it later.
[ ] Personally, I found reading the source code a little trick at time due to the formatting. Running
BiocCheck::BiocCheck()on the package points to some of the issues and potential solutions, e.g.,
- Consider shorter line lengths to improve readibility of source code.
- Consider running styler to get a more consistent formatting of source code.
I played with the styler package - it has some neat functions. I run it on all my code and hope the style is more consistent with the Bioconductor style now. I also broke up most long lines into shorter ones.
- [ ] FYI when constructing a matrix, you can specify row and column names in the constructor via the
dimnamesargument rather than setting them usingrownames<-()/colnames<-()afterwards.
Thanks! I use the dimnames argument in the revised code.
- [ ] Consider adding a top-level
README.md.
Done.
- [ ] For consistency with other functions in demuxmix, consider changing
p.acpt()/p.acpt<-()to usesnakeCase().
Done.
- [ ] Recommend adding a
inst/CITATIONfile.
The standard citation referencing the R package is fine for now. I am writing up a brief manuscript and will update the citation file when it is ready.
- [ ] Consider adding a package-level man page.
Done.
- [ ] Thank you for adding unit tests. Please take a look at
covr::report()to identify lines that aren't being tested and perhaps aim to ensure that critical funcionality is well-tested.
This is a weak spot of my package. Unfortunately, I started adding unit tests after the package was almost final. I am now familiar with unit tests and see how helpful they can be. For any future package change, I will add a respective unit test.
Thanks for responding to the review, @huklein. I'm not sure if I'll have time this week to go through your responses in detail, but you can expect a response in 1-2 weeks.
Thanks for making the requested changes, @huklein. I'm pleased to accept demuxmix into Bioconductor!
One minor typo to fix:
- [ ] Missing parenthesis in vignette? "Still, the overlap with the positive component (mean of 4977.8 is ..." should I think be "Still, the overlap with the positive component (mean of 4977.8) is".
Your package has been accepted. It will be added to the Bioconductor nightly builds.
Thank you for contributing to Bioconductor!
Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.
BTW I have been trialling demuxmix on some of my own datasets and have some queries and discussion points that are beyond the scope of this review. I will follow-up these with you on https://github.com/huklein/demuxmix/issues
Thank you for the helpful package review! I fixed the typos in the vignette and will reply to your additional comments on GitHub later this week.
Cheers, Hans
The master branch of your GitHub repository has been added to Bioconductor's git repository.
To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/huklein.keys is not empty), then no further steps are required. Otherwise, do the following:
See further instructions at
https://bioconductor.org/developers/how-to/git/
for working with this repository. See especially
https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/
to keep your GitHub and Bioconductor repositories in sync.
Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at
https://bioconductor.org/checkResults/
(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("demuxmix"). The package 'landing page' will be created at
https://bioconductor.org/packages/demuxmix
If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.