glmpca-py installation problem

pip3 install glmpca installs the package with no complaints but import glmpca gives an empty wrapper:

>>> dir(glmpca)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__']
>>> from glmpca import glmpca
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/site-packages/glmpca/glmpca.py", line 7, in <module>
    import statsmodels.api as sm
ModuleNotFoundError: No module named 'statsmodels'

Manual installation pip3 install statsmodels fixes the problem

Feb 01 '21 01:02 les-klimczak

possibly related to #14

Feb 01 '21 16:02 willtownes

@les-klimczak thanks for bringing this to my attention. I was able to reproduce it. Please try installing from github using pip install git+https://github.com/willtownes/glmpca-py.git@master. Even if statsmodels has not been previously installed this should now include it. If it works I'll do a release to pypi as soon as I fix the build errors.

Feb 01 '21 17:02 willtownes

@willtownes The install from github works in a clean Python3 instance without statsmodels. It exposes only the glmpca object:

>>> dir(glmpca)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'glmpca']

whereas the fixed install from PyPI has more:

>>> dir(glmpca)
['Decimal', 'GlmpcaError', 'GlmpcaFamily', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'colMeans', 'colNorms', 'colSums', 'crossprod', 'cvec1', 'digamma', 'est_nb_theta', 'glmpca', 'glmpca_init', 'log', 'mat_binom_dev', 'ncol', 'np', 'nrow', 'ortho', 'polygamma', 'remove_intercept', 'rowMeans', 'rowSums', 'sm', 'smf', 'tcrossprod', 'trigamma']

Feb 01 '21 18:02 les-klimczak

yes that's correct I haven't done the release yet as I need to fix some CI problems but from #14 I changed the package to only expose the glmpca module instead of all the helper functions. Does the current github version provide the functions that you need or would you benefit from something else being exposed?

Feb 01 '21 20:02 willtownes

@willtownes Those helper functions are definitely useful long term. But they are not absolutely required for our initial exploration. Also, we still have the PyPI version with them. We are very excited to test glmpca for our problem - outside of the single cell domain. As well as deviance residuals.

Feb 02 '21 16:02 les-klimczak

Thanks for your kind words! I regret that I haven't had enough time to implement the residuals stuff in python yet. If you are open to using R for your application, please consider the glmpca package from CRAN and the scry package on bioconductor as both of these are much further along in development than this python version. For example, scry is able to compute null residuals for HDF5-backed datasets that are too large to fit in memory, and the CRAN glmpca has a better optimizer (Avagrad) and better numerical stability than this python version.

Feb 02 '21 16:02 willtownes

@willtownes We are definitely aware of the R packages and have already crudely translated binomial_deviance_residuals into Python3. Most of our code is now under Python - for deep learning applications - so going back through R is a bit of a nuisance. A conceptual problem that we face with using GLM-PCA is that while in our problem the distribution of counts is based on samples/observations similar to cells and we want the model to capture and correct that, we are ultimately interested in clustering the features.

Feb 02 '21 17:02 les-klimczak