exo Can a large model be successfully loaded across multiple EXO nodes with insufficient individual memory?

I have three nodes, each with 3GB of memory. When I try to load the 8B LLAMA model locally on each node, I encounter an OOM (Out of Memory) error on the first node.

Jul 29 '24 14:07 artistlu

What device?

Jul 30 '24 07:07 AlexCheema

What device?

mali gpu arm64

Jul 30 '24 07:07 artistlu

we are loading up mac studios and notice that on subsequent inference, memory will suddenly spike on one of the nodes as if it is trying to reload model so it basically doubles mem usage - seems to be doing additional layer downloads on top of what is there

Aug 18 '24 19:08 ProjectAtlantis-dev

Screenshot 2024-08-18 at 22 27 29 Screenshot 2024-08-18 at 22 22 58

this box normally uses about 54gb but then spikes over 100

Aug 18 '24 19:08 ProjectAtlantis-dev

we are loading up mac studios and notice that on subsequent inference, memory will suddenly spike on one of the nodes as if it is trying to reload model so it basically doubles mem usage - seems to be doing additional layer downloads on top of what is there

The layer downloads themselves shouldn't spike memory usage. Also it will only run one download on a given model id at a time. If multiple were happening at once that would cause other issues that we'd see in the error logs.

In the logs you provide it looks like multiple inferences were triggered and each time a different set of layers were used. Are you running a cluster with many devices? If the devices are the same then the start_layer and end_layer should be the same. What setup are you running with?

Aug 18 '24 19:08 AlexCheema

Screenshot 2024-08-19 at 10 41 53 If I restart one of the nodes, one or more of the other nodes feels compelled to reload layers again, thereby overloading memory - attached are logs of 2 nodes that decided to load additional layers after a third one did a restart

Aug 19 '24 07:08 ProjectAtlantis-dev