scp icon indicating copy to clipboard operation
scp copied to clipboard

Subsetting a `SingleCellExperiment` breaks the `ScpModel` object

Open cvanderaa opened this issue 1 year ago • 8 comments

The issues has been highlighted in: https://github.com/UCLouvain-CBIO/scp/issues/56#issuecomment-2010539916

The reason is that the subsetting functionality in QFeatures does not know it should also subset the ScpModel objects contained in metadata(qfObject).

A solution I see is to overwrite the subsetting functionality of QFeatures (with call to nextMethod()), but also including a substerring of the metadata.

cvanderaa avatar Mar 22 '24 09:03 cvanderaa

This also implies creating subsetting functionality for ScpModel objects, which should be rather straightforward.

cvanderaa avatar Mar 22 '24 09:03 cvanderaa

Wait, I'm confused... an ScpModel is computed on a SingleCellExperiment. Why is the QFeatures subsetting and issue?

Also, having a quick look at the issue, isn't one problem that the sce variable wasn't created with getWithColData()?

lgatto avatar Mar 22 '24 09:03 lgatto

Sorry, you're right! It's subsetting the SingleCellExperiment :sweat_smile:

cvanderaa avatar Mar 22 '24 09:03 cvanderaa

Ok, then sub-setting the ScpModel would be the best approach.

On the other hand, from a user's perspective, what about sub-setting features at the model results data.frames level? Sub-setting cells or cell types kind of invalidates the results, as the model was produced with all cells.

lgatto avatar Mar 22 '24 09:03 lgatto

Also regarding this

the model was produced with all cells

do we want to allow sub-setting columns/cells?

lgatto avatar Mar 22 '24 09:03 lgatto

It makes indeed no sense to subset an ScpModel by cell as the estimated results depend on all cells. So when sub-setting an SCE by column, all ScpModel objects in the metadata() should be removed.

what about sub-setting features at the model results data.frames level

The output tables are disconnected from the SCE object, so it is not our responsibility and up to the user to subset the data tables as they wish. But every ScpModel object depends on an SCE object, so we must ensure that the functionality does not break upon subsetting, or at least explicitly forbid it. In practice, I see 3 possible approach:

  1. Add a check (in scpVariance*, scpDifferential*, scpComponent*) to detect whether the SCE has been subset and throw an error, that is forbid subsetting
  2. Add a check (in scpVariance*, scpDifferential*, scpComponent*) to detect whether the SCE has been subset and throw an error that suggests using a function (eg scpModelUpdate()). scpModelUpdate() should detect changes in cells (if change, remove scpModel) and changes in features (if change, subset scpModel).
  3. Overwrite sub-setting method of SCE so that is uses scpModelUpdate(). However, I'm not convinced this is a good idea to mess up with SCE's methods.

What do you think?

cvanderaa avatar Mar 22 '24 11:03 cvanderaa

The output tables are disconnected from the SCE object, so it is not our responsibility and up to the user to subset the data tables as they wish.

Of course, it not our responsibility. My point was rather that sub-setting by feature can be done on that level, so do we really need support sub-setting the SCE?

But now that I think a bit more about it, doesn't sub-setting by features invalidates some of the ScpModel results? Are the component and variance analyses still relevant? Would these be preformed on the sub-setted modelled data?

lgatto avatar Mar 22 '24 12:03 lgatto

In other words, and following up on your suggestions, do we want to support sub-setting an SCE after modelling? Shouldn't we not simply warn that this will lead to broken/invalid ScpModel results, and suggest to manipulate the individual dataframe results of re-run a new model on a subset of the data?

lgatto avatar Mar 22 '24 12:03 lgatto