cellhint icon indicating copy to clipboard operation
cellhint copied to clipboard

Analysis questions

Open Nusob888 opened this issue 1 year ago • 4 comments

Many thanks for this fantastic tool. My questions are not really an issue but more theoretical.

  • I presume if one does not have labels from the original datasets to be integrated, one could choose clustering resolutions for each dataset independently and then use cell hint to harmonise clustering across datasets?
  • Would it be possible to perform cell hint within a dataset? e.g. to harmonise different resolutions of clustering to find and optimise consensus clusters? This would an incredibly useful way to automate cluster resolution optimisation. If that is the case, would there be any appetite to develop this as an additional function?

Nusob888 avatar Jan 08 '24 10:01 Nusob888

@Nusob888

  1. Yes
  2. Thank you for this nice suggestion although currently it is not possible (at least not directly feasible) with CellHint. I will make such function late this month.

ChuanXu1 avatar Jan 09 '24 17:01 ChuanXu1

@Nusob888, please try cellhint.selfmatch to harmonize different annotations for the same set of cells. This function has been added in version 1.0.0. One thing to note is that harmonization was initially designed to unify cell type annotations from different datasets, this self-match function is thus a modified version for dealing with cells from only a single dataset.

ChuanXu1 avatar Jan 20 '24 19:01 ChuanXu1

Hi, I am going to try this function this week. Can I check a few things for the use_rep option?

Would you recommend:

  • Calculating an embedding per dataset? or a latent embedding on the whole dataset?
  • Using the raw expression matrix rather than use_rep? Presumably this will better suited for datasets with strong batch effects?

Additionally:

  • Would you advise against using cell hint on pre-integrated data embeddings such as scVI? as that might defeat the point of correction agnostic harmonisation?

Thanks again for all the input

Nusob888 avatar Jan 29 '24 10:01 Nusob888

@Nusob888, use_rep is usually suggested rather than raw expression matrix, as the latter is time-consuming. A latent space on the whole dataset is preferred. For the choice of latent representation, it's flexible. Using PCA is correction-agnostic, but may be noisy in terms of batch effect (CellHint has an internal procedure to mitigate this but cannot exclude its influence). Pre-integrated embeddings such as scVI are also good alternatives, but note that the result will be tuned towards the structure defined by scVI.

ChuanXu1 avatar Feb 12 '24 13:02 ChuanXu1