zellkonverter icon indicating copy to clipboard operation
zellkonverter copied to clipboard

Fix compatibility with {anndata}

Open rcannood opened this issue 1 year ago • 3 comments

This is a proposed solution for #75. This ensures that no implicit py_to_r(...) conversion occurs in AnnDataToSCE(...) using the disable_conversion_scope(...) helper function stolen from reticulate :)).

  • All values extracted from the AnnData objects (e.g. adata$X) need to manually converted to an R object.
  • If an AnnDataR6 is passed to AnnData2SCE, it first gets converted to a Python AnnData.
  • {anndata} was added to Suggests: so it can be tested during the unit tests. However, I think it'd be better to move the library(anndata) call to a lower point in the unit test. @lazappi WDYT?

I apologise for the many commits; I couldn't install rhdf5 so I had to use the CI for running the full R CMD Check. If this PR gets accepted at some point, I think it makes sense to squash.

rcannood avatar Oct 19 '22 14:10 rcannood

Update: this PR works on ubuntu-latest but not on :

  • windows-latest): Same error as #45 and others;

    Error log
    ═══════════════════════════════════════════════════════════════
    ── Error ('test-read.R:111'): Conversion of raw works ──────────────────────────
    <Rcpp::exception/C++Error/error/condition>
    Error in `py_ref_to_r(x)`: Conversion from numpy array type 20 is not supported
    Backtrace:
         ▆
      1. └─zellkonverter::readH5AD(example_file, raw = TRUE) at test-read.R:111:4
      2.   └─basilisk::basiliskRun(...)
      3.     └─zellkonverter (local) fun(...)
      4.       └─zellkonverter::AnnData2SCE(...)
      5.         └─zellkonverter:::.convert_anndata_slot(...)
      6.           ├─reticulate::py_to_r(item)
      7.           └─anndata:::py_to_r.collections.abc.Mapping(item)
      8.             └─python_builtins$dict(x)
      9.               ├─reticulate::py_to_r(result)
     10.               └─reticulate:::py_to_r.python.builtin.dict(result)
     11.                 └─reticulate:::py_ref_to_r(x)
    ── Error ('test-write.R:146'): writeH5AD works in a separate process ───────────
    Error in `checkForRemoteErrors(lapply(cl, recvResult))`: one node produced an error: TypeError: __init__() missing 1 required positional argument: 'dtype'
    
    Backtrace:
        ▆
     1. └─zellkonverter::writeH5AD(sce, temp) at test-write.R:146:4
     2.   └─basilisk::basiliskRun(...)
     3.     └─parallel::clusterCall(proc, fun = fun, ...)
     4.       └─parallel:::checkForRemoteErrors(lapply(cl, recvResult))
    ── Failure ('test-write.R:165'): writeH5AD DelayedArray X works ────────────────
    counts(sce) not equal to assay(out, "X").
    Types not compatible: integer is not S4
    ── Error ('test-write.R:196'): writeH5AD DelayedArray layer works ──────────────
    Error in `checkForRemoteErrors(lapply(cl, recvResult))`: one node produced an error: numpy.core._exceptions._ArrayMemoryError: Unable to allocate 459. MiB for an array with shape (3005, 20006) and data type float64
    
    Backtrace:
        ▆
     1. └─zellkonverter::readH5AD(temp, X_name = "X") at test-write.R:196:4
     2.   └─basilisk::basiliskRun(...)
     3.     └─parallel::clusterCall(proc, fun = fun, ...)
     4.       └─parallel:::checkForRemoteErrors(lapply(cl, recvResult))
    ── Error ('test-write.R:210'): writeH5AD works with colData list columns ───────
    Error in `checkForRemoteErrors(lapply(cl, recvResult))`: one node produced an error: TypeError: __init__() missing 1 required positional argument: 'dtype'
    
    Backtrace:
        ▆
     1. ├─testthat::expect_warning(writeH5AD(list_sce, temp), "colData columns are not atomic") at test-write.R:210:4
     2. │ └─testthat:::quasi_capture(...)
     3. │   ├─testthat (local) .capture(...)
     4. │   │ └─base::withCallingHandlers(...)
     5. │   └─rlang::eval_bare(quo_get_expr(.quo), quo_get_env(.quo))
     6. └─zellkonverter::writeH5AD(list_sce, temp)
     7.   └─basilisk::basiliskRun(...)
     8.     └─parallel::clusterCall(proc, fun = fun, ...)
     9.       └─parallel:::checkForRemoteErrors(lapply(cl, recvResult))
    ── Error ('test-write.R:226'): writeH5AD works with rowData list columns ───────
    Error in `checkForRemoteErrors(lapply(cl, recvResult))`: one node produced an error: TypeError: __init__() missing 1 required positional argument: 'dtype'
    
    Backtrace:
        ▆
     1. ├─testthat::expect_warning(writeH5AD(list_sce, temp), "rowData columns are not atomic") at test-write.R:226:4
     2. │ └─testthat:::quasi_capture(...)
     3. │   ├─testthat (local) .capture(...)
     4. │   │ └─base::withCallingHandlers(...)
     5. │   └─rlang::eval_bare(quo_get_expr(.quo), quo_get_env(.quo))
     6. └─zellkonverter::writeH5AD(list_sce, temp)
     7.   └─basilisk::basiliskRun(...)
     8.     └─parallel::clusterCall(proc, fun = fun, ...)
     9.       └─parallel:::checkForRemoteErrors(lapply(cl, recvResult))
    ── Failure ('test-write.R:240'): writeH5AD works with gzip compression ─────────
    assay(out, "X") not equal to assay(sce, "counts").
    Modes: S4, numeric
    Attributes: < Names: 2 string mismatches >
    Attributes: < Length mismatch: comparison on first 2 components >
    ── Failure ('test-write.R:249'): writeH5AD works with lzf compression ──────────
    assay(out, "X") not equal to assay(sce, "counts").
    Modes: S4, numeric
    Attributes: < Names: 2 string mismatches >
    Attributes: < Length mismatch: comparison on first 2 components >
    
  • mac os x): Same error as #45;

    Error log
    ════════════════════════════════════════════════════════════════
    ── Error ('test-read.R:111'): Conversion of raw works ──────────────────────────
    <Rcpp::exception/C++Error/error/condition>
    Error in `py_ref_to_r(x)`: Conversion from numpy array type 20 is not supported
    Backtrace:
         ▆
      1. └─zellkonverter::readH5AD(example_file, raw = TRUE) at test-read.R:111:4
      2.   └─basilisk::basiliskRun(...)
      3.     └─zellkonverter (local) fun(...)
      4.       └─zellkonverter::AnnData2SCE(...)
      5.         └─zellkonverter:::.convert_anndata_slot(...)
      6.           ├─reticulate::py_to_r(item)
      7.           └─anndata:::py_to_r.collections.abc.Mapping(item)
      8.             └─python_builtins$dict(x)
      9.               ├─reticulate::py_to_r(result)
     10.               └─reticulate:::py_to_r.python.builtin.dict(result)
     11.                 └─reticulate:::py_ref_to_r(x)
    
  • ubuntu): missing library

    Error log
      Quitting from lines 56-61 (zellkonverter.Rmd) 
    Error: Error: processing vignette 'zellkonverter.Rmd' failed with diagnostics:
    package or namespace load failed for 'scRNAseq' in dyn.load(file, DLLpath = DLLpath, ...):
     unable to load shared object '/home/runner/work/_temp/Library/bit64/libs/bit64.so':
      /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by /home/runner/work/_temp/Library/bit64/libs/bit64.so)
    --- failed re-building ‘zellkonverter.Rmd’
    

@lazappi Can you approve this PR so that the workflows can be run? Also, which of the above errors need to be fixed in this PR or can be considered a separate issue?

rcannood avatar Oct 20 '22 06:10 rcannood

I'm pretty happy with this, thanks for submitting it! I made some minor comments but they are mostly just questions I wanted some clarification on.

For the CI I'm kinda happy as long as it passes on one OS. The main thing is that it passes on the Bioconductor build system which tends to have less issues.

The numpy array type issue I think should be dealt with separately, your suggestion seems like a good starting point but we can look into to that more on the other issue. I will check that the changes here haven't made it more of a problem though.

lazappi avatar Oct 20 '22 10:10 lazappi

@rcannood I made the minor changes I commented on so I think this is almost good to go. Only thing left is an error with {anndata} loaded that I'm not sure how to fix.

It has to do with the conversion you added here https://github.com/theislab/zellkonverter/pull/76/files#diff-4c7ea0a920b659d712a5256b98d7e4072805ee394c295eb01c6ce7c990c37e8cR429-R432 using the example compatibility test in test-read.R. If {anndata} isn't loaded this works fine and the following call to .convert_anndata_list() catches a failed conversion but if {anndata} is loaded for some reason the conversion here is recursive and fails with an error (hopefully that made sense). Any suggestions? Adding support for converting these type 20 arrays as you suggested in #45 would be one option but that won't help if there is some other data type that is the issue.

lazappi avatar Oct 25 '22 10:10 lazappi

@rcannood I'm pretty happy this is working ok (and I have some other things I want to work on that need this) so I'm going to merge now. Thanks so much for all your help!

lazappi avatar Oct 26 '22 15:10 lazappi

Sorry that I was away the past few days, there were a few important deadlines and I lost track of my mails.

Thanks for making the necessary changes to get everything to work on your end!

rcannood avatar Oct 26 '22 17:10 rcannood