diffxpy icon indicating copy to clipboard operation
diffxpy copied to clipboard

entries of params were nan which will throw error in lstsq

Open QianjiangHu opened this issue 4 years ago • 13 comments

Hey, Author, I got a warning message "entries of params were nan which will throw error in lstsq" when I run the test for anndata with code: test = de.test.two_sample(YO_adata_AT2,grouping='grouping', test='wald', noise_model="nb") I want to know what is the problem here and how to fix it.

Thank you! Screenshot from 2020-03-19 18-55-53

QianjiangHu avatar Mar 19 '20 17:03 QianjiangHu

Hi @Qianjiang-Github, sorry for the delay!

  1. Is there any nan in your input data?
  2. Which diffxpy and batchglm version are you using?

davidsebfischer avatar May 18 '20 07:05 davidsebfischer

Hi,

I also encountered the same problem. I guess because of the funny "0" on GEM barcode column. I would appreciate it if someone help me to remove it.

Screen Shot

Best,

Keita

KeitaSaeki avatar Jun 08 '20 02:06 KeitaSaeki

@KeitaSaeki, I am not sure whether this is really the same underlying issue:

I guess because of the funny "0" on GEM barcode column.

You can probably get rid of it by defining the dataframe with an index when passing it to anndata, right now you exert relatively little control over the nature of the dataframe because you simply call its constructor with a pandas series.

davidsebfischer avatar Jun 09 '20 07:06 davidsebfischer

I am currently facing the exact same problem unfortunately. I am trying to use this with an anndata dataframe:

        res = de.test.two_sample(
            self.adata, grouping="de_base", test="wald", noise_model="nb",
        )

and this error is thrown.

I am trying to investigate it further to raise a specific way to reproduce it, but it is certainly an issue. I also made sure I have no NaNs anywhere:

        print(np.argwhere(np.isnan(self.adata.X)))
        print(self.adata.obs.isnull().sum().sum())
        print(self.adata.var.isnull().sum().sum())

all return:

[]
0
0

I just installed diffxpy, so it should be on the latest version available via pip.

marcellp avatar Jun 10 '20 13:06 marcellp

Hi folks,

I also had the same issue when trying to run de.test.wald. As @KeitaSaeki suggested, it looks like the problem was the name of the index column, i.e. in my case this Cell_Index and the empty line created NaNs. You can set it to None and remove the empty line by running adata.obs = adata.obs.rename_axis(None) and then everything works just fine. Screenshot 2020-06-16 at 20 23 16

Thanks, David, for the great package!

Update: scratch that, it only worked for one dataset and doesn't work for others.

alitinet avatar Jun 16 '20 18:06 alitinet

I'm facing the very same issue, the index name is not the culprit. If you, instead, make use of scaled data (e.g., sc.pp.scale) the NB estimator introduces NaNs. I've tried to run with unscaled data and it is working perfectly. @davidsebfischer I see that only 'nb' is accepted as noise_model parameter, I understand batch_glm supports gaussian noise, would it be useful to allow it in the de.test.wald function?

dawe avatar Jun 25 '20 09:06 dawe

Hi, I'm facing the same problem running de.test.wald. If I run np.isnan(np.sum(adata.raw.X.toarray())) it returns False.

xpastor avatar Jun 30 '20 15:06 xpastor

having the same issue here...

ywen1407 avatar Oct 30 '20 19:10 ywen1407

Same here, I tried removing the index names as was suggested and also checked for NaNs in my data using @xpastor s line of code, which returned False.

fairliereese avatar Oct 31 '20 22:10 fairliereese

Same issue here, I don't think this package is maintained anymore looking at the updates and maintenance

vladie0 avatar Nov 27 '20 13:11 vladie0

Exact same issue with my data, while everything works fine with simulated data from the tutorial.

My code:

test = de.test.wald(
    data=adata,
    formula_loc="~1+myfactor",
    factor_loc_totest="myfactor"
)

There are no NaNs in my data matrices (both np.count_nonzero(np.isnan(adata.X)) and np.count_nonzero(np.isnan(adata.raw.X)) return 0). Also, the cell index column in adata.obs has no name, so that cannot be an issue either.

bsierieb1 avatar Dec 06 '20 23:12 bsierieb1

any updates?

rojinsafavi avatar Dec 27 '20 23:12 rojinsafavi

the same problem here, hoping some solutions

teryyoung avatar Oct 19 '23 07:10 teryyoung