pynndescent icon indicating copy to clipboard operation
pynndescent copied to clipboard

Is there a formula to calculate the expected RAM Memory usage?

Open stephenleo opened this issue 4 years ago • 2 comments
trafficstars

I have a large dataset of N records (currently 100Million). Each record is d dimensions (say 300d). Is there a quick formula I can use to calculate the expected RAM usage? Thank you.

stephenleo avatar Dec 08 '20 01:12 stephenleo

Do you want the peak RAM usage (harder to calculate) or the size of an index? The latter is going to be a small multiple of the dataset RAM usage in terms of float32 entries. Regardless 100M samples is going to tax most machines, and index construction would be quite expensive (i.e. could take says even with multicore).

lmcinnes avatar Dec 08 '20 03:12 lmcinnes

I see I see... just the index size RAM @lmcinnes Unfortunately, my use case is a large dataset. Pynndescent provides the best queries/s on a sample of the dataset, hence my interest to know whether it can scale to this size. Thank you.

stephenleo avatar Dec 08 '20 03:12 stephenleo