exo [BOUNTY - $500] Support multiple models running concurrently

Currently exo supports multiple requests to the same LLM concurrently (after: https://github.com/exo-explore/exo/pull/282)
However, if you try to request 2 different LLMs concurrently it fails

Oct 03 '24 19:10 AlexCheema

Hi I would like to work on this

Oct 17 '24 02:10 DESU-CLUB

Hi I would like to work on this

Assigned. Good luck - pls tag me here or on Discord if you have any questions or run into bugs!

Oct 17 '24 04:10 AlexCheema

Increased bounty to 500 USD as this appears to be harder than anticipated.

Oct 23 '24 07:10 AlexCheema

No activity for a month. Opening this back up.

Nov 15 '24 07:11 AlexCheema

@DESU-CLUB, any ideas? what research did you get?

Nov 28 '24 23:11 lexasub

Hey sorry was busy with college

While working on this I found out that the engines had a race condition within the inference engines. When I tried reproducing the bug in both tinygrad and the torch engine I encountered issues such as the following:

I send a request to Model A and Model B, but the response for Model A appears in the chat for Model B and vice versa.

I believe adding semaphores should help solve the issue, but wasn't able to completely implement it bug free because there was an occasional deadlock that I was trying to fix

Nov 29 '24 03:11 DESU-CLUB

Hey sorry was busy with college

While working on this I found out that the engines had a race condition within the inference engines. When I tried reproducing the bug in both tinygrad and the torch engine I encountered issues such as the following:

I send a request to Model A and Model B, but the response for Model A appears in the chat for Model B and vice versa.

I believe adding semaphores should help solve the issue, but wasn't able to completely implement it bug free because there was an occasional deadlock that I was trying to fix

Sounds like good progress - reassigned

Nov 29 '24 17:11 AlexCheema

Hey update on the issue, with the new updates most of the code I wrote last month was invalid, so am working on a new version right now. I'm not too sure how much time I can commit on this so do feel free to unassign me if I am taking too much time.

The issue is still the same, there is a need for a semaphore or a lock to manage states at the model level to let multiple models run concurrently without causing race conditions.

Dec 23 '24 06:12 DESU-CLUB