scp
scp copied to clipboard
Subsetting a `SingleCellExperiment` breaks the `ScpModel` object
The issues has been highlighted in: https://github.com/UCLouvain-CBIO/scp/issues/56#issuecomment-2010539916
The reason is that the subsetting functionality in QFeatures does not know it should also subset the ScpModel objects contained in metadata(qfObject).
A solution I see is to overwrite the subsetting functionality of QFeatures (with call to nextMethod()), but also including a substerring of the metadata.
This also implies creating subsetting functionality for ScpModel objects, which should be rather straightforward.
Wait, I'm confused... an ScpModel is computed on a SingleCellExperiment. Why is the QFeatures subsetting and issue?
Also, having a quick look at the issue, isn't one problem that the sce variable wasn't created with getWithColData()?
Sorry, you're right! It's subsetting the SingleCellExperiment :sweat_smile:
Ok, then sub-setting the ScpModel would be the best approach.
On the other hand, from a user's perspective, what about sub-setting features at the model results data.frames level? Sub-setting cells or cell types kind of invalidates the results, as the model was produced with all cells.
Also regarding this
the model was produced with all cells
do we want to allow sub-setting columns/cells?
It makes indeed no sense to subset an ScpModel by cell as the estimated results depend on all cells. So when sub-setting an SCE by column, all ScpModel objects in the metadata() should be removed.
what about sub-setting features at the model results data.frames level
The output tables are disconnected from the SCE object, so it is not our responsibility and up to the user to subset the data tables as they wish. But every ScpModel object depends on an SCE object, so we must ensure that the functionality does not break upon subsetting, or at least explicitly forbid it. In practice, I see 3 possible approach:
- Add a check (in
scpVariance*,scpDifferential*,scpComponent*) to detect whether theSCEhas been subset and throw an error, that is forbid subsetting - Add a check (in
scpVariance*,scpDifferential*,scpComponent*) to detect whether theSCEhas been subset and throw an error that suggests using a function (egscpModelUpdate()).scpModelUpdate()should detect changes in cells (if change, removescpModel) and changes in features (if change, subsetscpModel). - Overwrite sub-setting method of
SCEso that is usesscpModelUpdate(). However, I'm not convinced this is a good idea to mess up withSCE's methods.
What do you think?
The output tables are disconnected from the SCE object, so it is not our responsibility and up to the user to subset the data tables as they wish.
Of course, it not our responsibility. My point was rather that sub-setting by feature can be done on that level, so do we really need support sub-setting the SCE?
But now that I think a bit more about it, doesn't sub-setting by features invalidates some of the ScpModel results? Are the component and variance analyses still relevant? Would these be preformed on the sub-setted modelled data?
In other words, and following up on your suggestions, do we want to support sub-setting an SCE after modelling? Shouldn't we not simply warn that this will lead to broken/invalid ScpModel results, and suggest to manipulate the individual dataframe results of re-run a new model on a subset of the data?