Top2Vec icon indicating copy to clipboard operation
Top2Vec copied to clipboard

Advice on scalability

Open ziqizhang opened this issue 3 years ago • 1 comments

Hi, this is a great project, thanks for sharing it!

I would like to use this on a very large corpus, which contains probably tens of millions of short messages, perhaps more than 10GB data. I want to ask if this can scale up to that size. If so, I suppose I need to load the documents incrementally - can you please point me to some code that allows me to do this?

If not, can you please suggest what is the memory/cpu/gpu resources required for xMB of data so I can estimate how much data I can process using my computational resourceS?

Thanks

ziqizhang avatar Feb 08 '22 10:02 ziqizhang

The version about to be released 1.0.27 should scale to millions of short messages but that does ultimately depend on the RAM of your machine.

ddangelov avatar Apr 03 '22 22:04 ddangelov