cleora
cleora copied to clipboard
graph partitioning
There's an option named 'num_partitions' in pytorch-biggraph that can reduce the peak memory usage, Can Cleora provide that option too? is it possible in the future? my situation: 40M nodes 180M edges more than 20GB of peak memory usage to train Cleora embeddings! I also set ( --in-memory-embedding-calculation 0 )
Hi @sademakn !
It's planned in the future, but we can't promise any deadlines. You're more than welcome to contribute. As per our whitepaper, you can split the graph into multiple parts and average the resulting embeddings, without sacrificing too much quality. Also 20GB peak usage is not much ;) Look up spot instances on Azure/GCP/AWS, you can get 500GB RAM for $1.5/hr.
Hi Thank you for your answer, I'll try to find a spare time to work on partitioning but I am a beginner in rust!