Can a large model be successfully loaded across multiple EXO nodes with insufficient individual memory?
I have three nodes, each with 3GB of memory. When I try to load the 8B LLAMA model locally on each node, I encounter an OOM (Out of Memory) error on the first node.
What device?
What device?
mali gpu arm64
we are loading up mac studios and notice that on subsequent inference, memory will suddenly spike on one of the nodes as if it is trying to reload model so it basically doubles mem usage - seems to be doing additional layer downloads on top of what is there
this box normally uses about 54gb but then spikes over 100
we are loading up mac studios and notice that on subsequent inference, memory will suddenly spike on one of the nodes as if it is trying to reload model so it basically doubles mem usage - seems to be doing additional layer downloads on top of what is there
The layer downloads themselves shouldn't spike memory usage. Also it will only run one download on a given model id at a time. If multiple were happening at once that would cause other issues that we'd see in the error logs.
In the logs you provide it looks like multiple inferences were triggered and each time a different set of layers were used. Are you running a cluster with many devices? If the devices are the same then the start_layer and end_layer should be the same. What setup are you running with?
If I restart one of the nodes, one or more of the other nodes feels compelled to reload layers again, thereby overloading memory - attached are logs of 2 nodes that decided to load additional layers after a third one did a restart