zellkonverter
zellkonverter copied to clipboard
`readH5AD()` only works after Keyboard Interruption
Hello, I ran into this weird behavior when running readH5AD
. For some reason, every first invocation of readH5AD
stalls up to the point of displaying the anndata version. However, if I interrupt this call then the following invocations of readH5AD
work as expected. Is there a way I could get this to work without the Keyboard Interruption?
I am trying to run this call for 72 h5 files so ideally I would want this to be done without any user interaction.
Thanks, Saul
Example:
> rna_sce <- readH5AD("./BD1.h5ad", X_name="counts", version = "0.8.0", reader="python", verbose=TRUE)
ℹ Using the Python reader
ℹ Using anndata version 0.8.0
^C
> rna_sce <- readH5AD("./BD1.h5ad", X_name="counts", version = "0.8.0", reader="python", verbose=TRUE)
ℹ Using the Python reader
ℹ Using anndata version 0.8.0
✔ Read ./BD1.h5ad [532ms]
ℹ uns is empty and was skipped
✔ X matrix converted to assay [4s]
ℹ layers is empty and was skipped
✔ var converted to rowData [87ms]
✔ obs converted to colData [46ms]
ℹ varm is empty and was skipped
ℹ obsm is empty and was skipped
ℹ varp is empty and was skipped
ℹ obsp is empty and was skipped
✔ SingleCellExperiment constructed [467ms]
ℹ Skipping conversion of raw
✔ Converting AnnData to SingleCellExperiment ... done
R session info:
R version 4.2.0 (2022-04-22)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /home/saulv/.conda/envs/scRNA_env/lib/libopenblasp-r0.3.20.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] basilisk_1.8.0 reticulate_1.25
[3] zellkonverter_1.6.3 scDblFinder_1.10.0
[5] SingleCellExperiment_1.18.0 SummarizedExperiment_1.26.1
[7] Biobase_2.56.0 GenomicRanges_1.48.0
[9] GenomeInfoDb_1.32.3 IRanges_2.30.0
[11] S4Vectors_0.34.0 BiocGenerics_0.42.0
[13] MatrixGenerics_1.8.1 matrixStats_0.62.0
loaded via a namespace (and not attached):
[1] viridis_0.6.2 edgeR_3.38.4
[3] BiocSingular_1.12.0 jsonlite_1.8.0
[5] viridisLite_0.4.0 here_1.0.1
[7] DelayedMatrixStats_1.18.0 scuttle_1.6.2
[9] statmod_1.4.37 dqrng_0.3.0
[11] GenomeInfoDbData_1.2.8 vipor_0.4.5
[13] Rsamtools_2.12.0 yaml_2.3.5
[15] ggrepel_0.9.1 pillar_1.8.0
[17] lattice_0.20-45 glue_1.6.2
[19] limma_3.52.2 beachmat_2.12.0
[21] XVector_0.36.0 colorspace_2.0-3
[23] Matrix_1.4-1 XML_3.99-0.10
[25] pkgconfig_2.0.3 dir.expiry_1.4.0
[27] zlibbioc_1.42.0 purrr_0.3.4
[29] scales_1.2.0 ScaledMatrix_1.4.0
[31] BiocParallel_1.30.3 tibble_3.1.8
[33] generics_0.1.3 ggplot2_3.3.6
[35] xgboost_1.6.0.1 cli_3.3.0
[37] magrittr_2.0.3 crayon_1.5.1
[39] fansi_1.0.3 MASS_7.3-58.1
[41] bluster_1.6.0 beeswarm_0.4.0
[43] data.table_1.14.2 tools_4.2.0
[45] scater_1.24.0 BiocIO_1.6.0
[47] lifecycle_1.0.1 basilisk.utils_1.8.0
[49] locfit_1.5-9.6 munsell_0.5.0
[51] cluster_2.1.3 DelayedArray_0.22.0
[53] irlba_2.3.5 Biostrings_2.64.0
[55] compiler_4.2.0 rsvd_1.0.5
[57] rlang_1.0.4 grid_4.2.0
[59] RCurl_1.98-1.8 BiocNeighbors_1.14.0
[61] rjson_0.2.21 igraph_1.3.4
[63] bitops_1.0-7 restfulr_0.0.15
[65] gtable_0.3.0 codetools_0.2-18
[67] R6_2.5.1 GenomicAlignments_1.32.1
[69] gridExtra_2.3 dplyr_1.0.9
[71] rtracklayer_1.56.1 utf8_1.2.2
[73] rprojroot_2.0.3 filelock_1.0.2
[75] metapod_1.4.0 ggbeeswarm_0.6.0
[77] parallel_4.2.0 Rcpp_1.0.9
[79] png_0.1-7 scran_1.24.0
[81] vctrs_0.4.1 tidyselect_1.1.2
[83] sparseMatrixStats_1.8.0
Hi @saulvegasauceda
Thanks for giving {zellkonverter} a go. I think what you might be interrupting is the creation of the {basilisk} Python environment. What happens if you just let it run? Does it finish eventually or just hang forever?
I've let it run for 11 hours, it did not finish. I think it's safe to assume it would have stalled indefinitely.
Usually the creation of the basilisk environment would be accompanied by a lot of noise and thunder from Conda. I don't see any of this in the stdout above; and besides, if this was interrupted, subsequent calls should not work.
I assume that the environment was already provisioned in the first call above. Suggest debug()
ing a relevant function and stepping through to see where the stall is occurring.
Thank you @LTLA @lazappi for responding!
It remains stalled until I interrupt it.
debug(writeH5AD(rna_data, h5_rna, verbose=TRUE))
ℹ Using anndata version 0.8.0
^C
Not sure what's causing this behavior but using the R.utils library function withTimeout()
bandaged the issue.
Here's what I did:
withTimeout({readH5AD(input_file, verbose = TRUE)}, timeout=120, onTimeout="silent")
rna_sce <- readH5AD(input_file, verbose = TRUE)
I think you need to use debug()
slightly differently. If you do:
debug(zellkonverter::readH5AD)
zellkonverter::readH5AD(input_file, verbose = TRUE)
That should open the debugging browser. Then you can step through the function line by line and see which line is getting stuck. Depending on where it is debug(zellkonverter::AnnData2SCE)
might be more useful.
Do you have this issue on another machine and/or with different input files?
Closing this issue as I hope it has been resolved in recent releases.