cleora icon indicating copy to clipboard operation
cleora copied to clipboard

graph partitioning

Open sademakn opened this issue 4 years ago • 2 comments

There's an option named 'num_partitions' in pytorch-biggraph that can reduce the peak memory usage, Can Cleora provide that option too? is it possible in the future? my situation: 40M nodes 180M edges more than 20GB of peak memory usage to train Cleora embeddings! I also set ( --in-memory-embedding-calculation 0 )

sademakn avatar May 14 '21 03:05 sademakn

Hi @sademakn !

It's planned in the future, but we can't promise any deadlines. You're more than welcome to contribute. As per our whitepaper, you can split the graph into multiple parts and average the resulting embeddings, without sacrificing too much quality. Also 20GB peak usage is not much ;) Look up spot instances on Azure/GCP/AWS, you can get 500GB RAM for $1.5/hr.

piobab avatar May 14 '21 14:05 piobab

Hi Thank you for your answer, I'll try to find a spare time to work on partitioning but I am a beginner in rust!

sademakn avatar May 14 '21 21:05 sademakn