pynndescent
pynndescent copied to clipboard
Is there a formula to calculate the expected RAM Memory usage?
I have a large dataset of N records (currently 100Million). Each record is d dimensions (say 300d). Is there a quick formula I can use to calculate the expected RAM usage? Thank you.
Do you want the peak RAM usage (harder to calculate) or the size of an index? The latter is going to be a small multiple of the dataset RAM usage in terms of float32 entries. Regardless 100M samples is going to tax most machines, and index construction would be quite expensive (i.e. could take says even with multicore).
I see I see... just the index size RAM @lmcinnes Unfortunately, my use case is a large dataset. Pynndescent provides the best queries/s on a sample of the dataset, hence my interest to know whether it can scale to this size. Thank you.