socialsent
socialsent copied to clipboard
Memory issues for network construction (i.e. nearest neighbor computation)
Hi Will,
Back to you with some memory issues. My experience so far is that SocialSent runs into memory problem when you reach a threshold of more or less 7000 words to score. So I ran it on a distributed architecture (shartcnet) with 38000 words to score and ask for 16G memory, yet it very soon runs out of memory again:
...
Using Theano backend.
/opt/sharcnet/python/2.7.8/intel/lib/python2.7/site-packages/scipy/lib/_util.py:35: DeprecationWarning: Module scipy.linalg.blas.fblas is deprecated, use scipy.linalg.blas instead
DeprecationWarning)
Evaluating SentProp with 100 dimensional GloVe embeddings
Evaluating binary and continuous classification performance
LEXICON
SEEDS
EMBEDDINGS
EVAL_WORDS
Traceback (most recent call last):
File "concreteness.py", line 95, in
Job returned with status 1. WARNING: Job only used 1 % of its requested walltime. WARNING: Job only used 0 % of its requested cpu time. WARNING: Job only used 65 % of allocated cpu time. WARNING: Job only used 74% of its requested memory. ...
A solution would be to run it 7000 words at time. But maybe you know a way to increase the memory use by the program?
Thanks, Michel
Numpy can't natively handle or distribute large matrix computations that are needed. I think the solution is to write some cython/c code to handle the Dinv.dot(L).dot(Dinv) computation.