djvdj icon indicating copy to clipboard operation
djvdj copied to clipboard

move external data sets to ExperimentHub

Open jayhesselberth opened this issue 3 years ago • 3 comments

CRAN / Bioconductor won't allow downloading of external data; it won't pass their checks. We can't use download.file anywhere.

Need to:

  1. Generate a smaller data set (a sample of the data being downloaded) that can be included in the pacakge and used for readme / vignettes
  2. Put larger data sets in an ExperimentHub

jayhesselberth avatar Oct 30 '22 14:10 jayhesselberth

I'll downsample the current vignette data so we can include it in the package

sheridar avatar Oct 30 '22 17:10 sheridar

this seems to work:

# sample 1,000 cells from the splen_so object
library(Seurat)

download.file(
  "https://djvdj-data.s3.us-west-2.amazonaws.com/splenocytes.zip",
  "splenocytes.zip",
  quiet = TRUE
)

unzip("splenocytes.zip", overwrite = FALSE)

# Load Seurat object
load("splenocytes/splen_so.rda")

set.seed(42)
# https://github.com/satijalab/seurat/issues/3108#issuecomment-685975338
splen_so_tiny <- splen_so[, sample(colnames(splen_so), size = 1000, replace=FALSE)]

# xz provides better compression than bzip2 default
usethis::use_data(splen_so_tiny, compress = 'xz')

Would then need to take these cell barcodes and filter the 10x files

jayhesselberth avatar Oct 30 '22 18:10 jayhesselberth

You should shoot for 5 MB or less of packaged data. splen_so_tiny above is ~1.8 MB.

jayhesselberth avatar Oct 31 '22 12:10 jayhesselberth