Advice on scalability

Open ziqizhang opened this issue 3 years ago • 1 comments

Hi, this is a great project, thanks for sharing it!

I would like to use this on a very large corpus, which contains probably tens of millions of short messages, perhaps more than 10GB data. I want to ask if this can scale up to that size. If so, I suppose I need to load the documents incrementally - can you please point me to some code that allows me to do this?

If not, can you please suggest what is the memory/cpu/gpu resources required for xMB of data so I can estimate how much data I can process using my computational resourceS?

Thanks

Feb 08 '22 10:02 ziqizhang

The version about to be released 1.0.27 should scale to millions of short messages but that does ultimately depend on the RAM of your machine.

Apr 03 '22 22:04 ddangelov