exo [BOUNTY - $300] Support function/tool calling as laid out by OpenAI's API spec

I think it would be awesome if exo supported tool calling as laid out by OpenAI's API spec. It would allow people to get reliable structured generation via the Instructor library that is quite popular.

Something to note: What Instructor sends to OpenAI in its tools request array is an object that is a JSON schema object. This is NOT represented in the OpenAI docs, even though that's what it is.

If you don't allow using arbitrary JSON schemas as your tools in the request, it will not work with Instructor, and you will not be able to have more interesting schemas that go more than 1 top level object with only primitive fields. So in order to ensure that this works well, I think a requirement should be supporting JSON schemas as what tools takes in.

Oct 05 '24 23:10 vanakema

Assigned $300 bounty to this

Oct 06 '24 17:10 AlexCheema

@AlexCheema can I take on this task? I've got some ideas

Oct 06 '24 18:10 vanakema

If not since I'm the reporter, no worries

Oct 06 '24 18:10 vanakema

I can try to take this on!

Oct 17 '24 04:10 master-senses

I can try to take this on!

Assigned!

Please tag me here or on Discord if you have any questions or run into any bugs,

Oct 17 '24 04:10 AlexCheema

sg! I will start working on this on friday. I can't join the discord through the invite link on the repo, is there another link?

Oct 17 '24 04:10 master-senses

Hi, I’d love to take this on! I’ve worked extensively with function/tool calling for clients using OpenAI’s API, and I’ve hosted hackathons and written tutorials on it. Excited to work together and contribute to the project!

@AlexCheema

Oct 17 '24 05:10 Sanchay-T

Can the google sheet be updated to denote @master-senses has this one? Almost grabbed it when looking at the Sheet (cc @AlexCheema )

Nov 10 '24 01:11 moosh3

@master-senses not sure how you're planning on doing it, but I was gonna implement it using Outlines: https://github.com/dottxt-ai/outlines Their implementation of bound generation results in no performance penalty (and can actually result in faster generation if implemented right). Main difficulty is figuring out which of the lower level functions in outlines to use since Exo works at the tensor level in order to do the distributed computing.

Just figured I'd give you some guidance in case you're figuring out where to start still. Excited to see what you create!

Nov 12 '24 22:11 vanakema

Hey thanks for this! This helps. I'm still in school and it's been kicking my ass, so I've been a bit late with finishing this up

Nov 12 '24 22:11 Hrishikesh-Kalyanaraman

No progress made after a month so opening this back up.

Nov 15 '24 07:11 AlexCheema

I'm happy to claim this one and take it on, if allowed

Nov 16 '24 02:11 vanakema

I'm happy to claim this one and take it on, if allowed

Assigned - good luck!

Nov 18 '24 08:11 AlexCheema

Sweet! Thanks, will get started on this

Nov 23 '24 00:11 vanakema

FYSA: Started work on this last weekend. Made some good headway on this, getting more familiar with the inference part of Exo, and figured out where in the code I should be implementing this. I've also taken a look at vLLM and how they handle structured generation from an API perspective in order to mirror that as closely as possible with the goal of making this as transparently interchangeable as possible.

I plan on starting implementation this weekend if I have time, or doing so while off for the holidays. Should I be opening a draft PR as I make progress? Or should I just open the PR when I'm done.

Dec 12 '24 22:12 vanakema

I'm releasing this bounty to whomever would like to pick it up. Some big changes at work has made this a lower of a priority for me, and I'd rather see it get done than collect the bounty. Happy to help whomever picks it up! If no one picks it up, and I have enough free time, I'll just open a PR for it. But in the meantime please assume that will not happen.

If you're looking for where to start in terms of how to implement the behavior, please refer to vLLM's implementation of the OpenAI structured generation spec. By mirroring their behavior and logic, you ensure that what you're doing will help the highest number of GPU poor AI devs take their apps to real clusters in the future since's it's pretty much the canonical "high performance" inference backend that has structured generation implemented properly to OpenAI's spec.

Feb 04 '25 03:02 vanakema

Is this bounty now free to give a go @vanakema @AlexCheema? I'm happy to take a look after the current PRs I am working on are complete

Feb 20 '25 17:02 joshuacoles

fine with me

Feb 21 '25 02:02 vanakema