Nick Gerner
Nick Gerner
Any progress on any of these suggestions? in particular, batching and/or support for corpusfile format seem desirable. I saw a big performance speedup using a corpus file for training and...
so... you're saying _someone_ should just make a pull request and send it over? also, FWIW, I took a (very cursory) look at fasttext: https://github.com/facebookresearch/fastText/ it has comparable model performance...
Facebook Research's fastText implementation has a mode that trains word embeddings and then combines them (one of their papers suggests it's an average) to do sentence embeddings. Certainly a different...
I was searching for gensim perf stuff and found this bug again. This time I'm taking a closer look. It looks to me like `infer_vector()` uses a code path that...
@gojomo yes, parallelism is part of the goal. Are you suggesting trying to match what's happening in `Doc2Vec._do_train_job` which gets called when you use set corpus_iterable and not corpus_file? From...
I also had problems with a complex data set, 100s of features, 10s of 1000s of samples. I made some progress naively using doubles instead of floats and turning down...
@GeorgePearse been a while since I did anything with torchnca, in part because of bug/operational issues like this. I think I was trying to do some unsupervised clustering work at...
@GeorgePearse I had some success with LMNN. There's a python implementation, [PyLMNN](https://github.com/johny-c/pylmnn). Perhaps you can give it a try?
just to be super clear: if you repeat these steps with 0.3.13 you don't get the exception are these methods deprecated or not supported? it seems like this is core...
What we do to test this kind of interaction with AWS is pull creds from environment variables. As @marinerJB suggests there's a free tier for AWS. Using these two approaches...