installation problem
pip3 install glmpca installs the package with no complaints but
import glmpca gives an empty wrapper:
>>> dir(glmpca)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__']
>>> from glmpca import glmpca
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/site-packages/glmpca/glmpca.py", line 7, in <module>
import statsmodels.api as sm
ModuleNotFoundError: No module named 'statsmodels'
Manual installation
pip3 install statsmodels
fixes the problem
possibly related to #14
@les-klimczak thanks for bringing this to my attention. I was able to reproduce it. Please try installing from github using pip install git+https://github.com/willtownes/glmpca-py.git@master. Even if statsmodels has not been previously installed this should now include it. If it works I'll do a release to pypi as soon as I fix the build errors.
@willtownes The install from github works in a clean Python3 instance without statsmodels. It exposes only the glmpca object:
>>> dir(glmpca)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'glmpca']
whereas the fixed install from PyPI has more:
>>> dir(glmpca)
['Decimal', 'GlmpcaError', 'GlmpcaFamily', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'colMeans', 'colNorms', 'colSums', 'crossprod', 'cvec1', 'digamma', 'est_nb_theta', 'glmpca', 'glmpca_init', 'log', 'mat_binom_dev', 'ncol', 'np', 'nrow', 'ortho', 'polygamma', 'remove_intercept', 'rowMeans', 'rowSums', 'sm', 'smf', 'tcrossprod', 'trigamma']
yes that's correct I haven't done the release yet as I need to fix some CI problems but from #14 I changed the package to only expose the glmpca module instead of all the helper functions. Does the current github version provide the functions that you need or would you benefit from something else being exposed?
@willtownes Those helper functions are definitely useful long term. But they are not absolutely required for our initial exploration. Also, we still have the PyPI version with them. We are very excited to test glmpca for our problem - outside of the single cell domain. As well as deviance residuals.
Thanks for your kind words! I regret that I haven't had enough time to implement the residuals stuff in python yet. If you are open to using R for your application, please consider the glmpca package from CRAN and the scry package on bioconductor as both of these are much further along in development than this python version. For example, scry is able to compute null residuals for HDF5-backed datasets that are too large to fit in memory, and the CRAN glmpca has a better optimizer (Avagrad) and better numerical stability than this python version.
@willtownes We are definitely aware of the R packages and have already crudely translated binomial_deviance_residuals into Python3. Most of our code is now under Python - for deep learning applications - so going back through R is a bit of a nuisance. A conceptual problem that we face with using GLM-PCA is that while in our problem the distribution of counts is based on samples/observations similar to cells and we want the model to capture and correct that, we are ultimately interested in clustering the features.