kana icon indicating copy to clipboard operation
kana copied to clipboard

Feature request: Re-analysing a user-specified subset of data

Open PeteHaitch opened this issue 2 years ago • 2 comments

It would be great if it were possible to select a subset of cells and re-analyse that subset. E.g., In a dataset of PBMCs, select all 'B cells' (based on the cluster annotations) and re-analyse to look for subclusters within the B cell population. Is this something planned for future release or that would be feasible to implement?

PeteHaitch avatar Mar 30 '22 23:03 PeteHaitch

that's a great idea. we could enable this by making the cluster names clickable and automatically open a new session with this subset.

What really happens in the background is the app would subset the cells in that cluster, create a new h5ad and store this in the browser. Tracking the provenance of datasets and subsets is left to the user.

jkanche avatar Mar 31 '22 04:03 jkanche

Awesome! Thank you for considering the suggestion and for making such a great app.

PeteHaitch avatar Mar 31 '22 05:03 PeteHaitch

It took a while, but we accidentally added this feature as a combination of other features.

The workflow goes like this:

  • Do an analysis on the full dataset.
  • Save the results as ZIP file.
  • Start a new analysis (possibly on a different tab, if you want to look at both at once), loading the aforementioned ZIP file.
  • Click on "subsetting by annotation" and choose what you want to subset by.

This allows you to subset by cluster numbers or custom selections (choose 1 for cells inside the selection). And of course, you can pick any of your existing annotations if you want.

Admittedly, this isn't the streamlined experience I was thinking about in #120, but I would argue that this approach is more transparent and reproducible, as the user is obliged to save a copy of the original analysis. If we just added an "analyze subset" button directly, kana would have to keep track of the provenance of the subsets - tracking how they were generated in the configuration, invalidating them if the parent analysis changes, saving them in the results, etc...

LTLA avatar May 03 '23 17:05 LTLA

Thanks!

If someone has a processed dataset in another format (e.g., .rds containing a SingleCellExperiment object) can they go directly to loading it and "subsetting by annotation"?

PeteHaitch avatar May 03 '23 22:05 PeteHaitch

Of course: Screenshot 2023-05-03 at 4 23 12 PM

LTLA avatar May 03 '23 23:05 LTLA

Fantastic, thank you!

PeteHaitch avatar May 04 '23 00:05 PeteHaitch