muscat icon indicating copy to clipboard operation
muscat copied to clipboard

DP simulated data has inverted logFC and only one bimodal group

Open ChristophH opened this issue 1 year ago • 3 comments

Hi,

Thanks for creating and maintaining muscat, and all your hard work that goes into method development and benchmarking!

I think there is a bug/inconsistency in the sign of the logFC of simulated data. When simulated with the 'dp' category, the logFC is applied in the opposite direction as, for example, with the 'de' or 'dm' category.

In .sim() you can see how either '-lfc', or 'lfc' is used with group 1

https://github.com/HelenaLC/muscat/blob/10cd3262bd5bc45930d62dc8f7d8b97629c3bbe4/R/utils-simData.R#L329-L352

But maybe I'm missing something.

Cheers, Christoph

ChristophH avatar Aug 14 '23 15:08 ChristophH

Great question. Yeah, so it’s admittedly been a while since writing this, but looking at the code & comments, I think the reasoning is at follows…

  • we generally assume the prop. of DP genes (dp) to be low, say 10%.
  • if all logFCs were positive, 10% in group 1 would be “up” and 90% in group 2 would be down, so group 1 would be relatively higher.
  • since logFCs are drawn symmetrically, the same holds vice versa, so that overall, both groups are affected the same, but varying cell proportions might affect this
  • If dp>0.5, the signal is reversed, so that the effect’s direction overall (if cell number differ between groups) depends on the proportions, which distinguishes this case from others
  • so, the code is just how this has been implemented, but I understand the confusion …hope that makes sense?

HelenaLC avatar Aug 14 '23 17:08 HelenaLC

Thanks for the explanation. However, I was just trying to validate that behavior using the .sim function with some toy parameters, but could not reproduce the DP density curves. image [Crowell et al., Nat Com, 2020]

For DE and DM the curves look as expected. But for DP, only group A shows the bimodal distribution. As a result, the observed logFC is always negative, even when the input logFC is positive. image

(For details see this R notebook)

I understand that .sim is an internal function, but I'm going down this rabbit hole as I'm trying to understand why my muscat generated data for DP genes does not look as expected.

Edit: Now I've used muscat::simData to generate data and illustrate the problem. The R notebook is here. The figure below shows one high and one low logFC gene per differential category. DP shows opposite logFC sign compared to DE and DM. Also, DP looks like DM with just one bimodal distribution. image

Am I doing something wrong, or missing something?

ChristophH avatar Aug 14 '23 22:08 ChristophH

Short recap:

  1. Genes generated under the DP model show unexpected logFC (observed logFC is almost always opposite direction of the logFC of the gene_info table in the simulation metadata)
  2. Genes generated under the DP model do not show bimodal distribution in both groups (just in one group, this explains the issue above)

To me, this seems like unexpected behavior (a bug?), or am I doing it wrong? Please advice.

ChristophH avatar Aug 23 '23 07:08 ChristophH