milopy
                                
                                
                                
                                    milopy copied to clipboard
                            
                            
                            
                        Refactoring with MuData
The original cell-level AnnData object and the sample x nhoods AnnData are stored in a common MuData object with .mod cells and samples. Samples are in mdata["samples"].obs and nhoods are in mdata["samples"].var.
Notes
- The 
make_nhoodsfunction is left as is, the MuData object gets created bycount_nhoods. The rationale is that here samples are not taken into consideration yet. - This MuData doesn't follow any of the possible options for 
axisas defined in MuData documentation, since neither.obsnor.varare shared between cells and samples 
To do
- [x] Update documentation
 - [ ] Update vignette
 - [x] Update README
 - [ ] ?
 
@emdann
@xinyuejohn started with the integration of milopy into pertpy here: https://github.com/theislab/pertpy/pull/165 He could also tackle my feedback that I posted here if it's fine with you? Then we don't need to duplicate the work and John and me can figure this out directly.
What do you think?
Hi @Zethson Apologies for the late react on this and thanks a lot for the feedback. I am a bit slumped with some urgent work at the moment so I am more than happy for @xinyuejohn to take over and implement these changes during integration!
Great!
A few general comments:
- You should consider adding all doc related files to the gitignore. They pollute pull requests. A CI job should build the docs and with for example RTD integration you can also preview the docs before merging. This does not matter for our pertpy plans, but wanted to write it nevertheless.
 - A couple of tools of pertpy have a "load" function that prepares the input for the subsequent tool. I could see us having a load function that returns a MuData object (by creating or appending) that is ready for all downstream Milopy steps. ->
 milopy = pt.tl.Milopy() mdata = milopy.load(adata) # adds compositional adata object and returns mdata object # alternative mdata = milopy.load(mdata_old, rna="rna") # where rna refers to the existing RNA modality in the AnnData object
- I didn't add this comment everywhere, but we should ensure that the naming is consistent with unwritten muon/MuData usage rules -> "rna" and "compositional" ("samples" might be too unspecified). Likely should even name it "milo_compositional", because tascCODA might add its own "compositional"
 
As for your comment 2, the current workflow works like this:
- Use 
milo.make_nhoods(adata, ...)to get neighbourhoods, which saved in adata itself - Use 
milo_mdata = milo.count_nhoods(adata, sample_col="sample")to get a MuData object. 
Therefore, if I need to add a mdata = milo.load() function, it should substitute the original milo.count_nhoods function. What do you think? Do you think it follows the best practice? @Zethson
Therefore, if I need to add a
mdata = milo.load()function, it should substitute the originalmilo.count_nhoodsfunction. What do you think? Do you think it follows the best practice? @Zethson
What I had in mind is that the load function only generates the actual object that we will use for downstream tasks and not that it does any computation already. It basically only sets up the MuData object.
Does it make sense?
Therefore, if I need to add a
mdata = milo.load()function, it should substitute the originalmilo.count_nhoodsfunction. What do you think? Do you think it follows the best practice? @ZethsonWhat I had in mind is that the load function only generates the actual object that we will use for downstream tasks and not that it does any computation already. It basically only sets up the MuData object.
Does it make sense?
Makes sense to me and it makes the whole pipeline more clear. In this way, the MuData object the load function generates only includes one 'rna' modality. And all the downstream functions take this mudata as the input.