cortex Support websockets in realtime APIs

trafficstars

Motivation

Reduce latency when multiple requests are required
Stream output from the predictor as it's generated

Oct 29 '20 18:10 deliahu

Motivation

Reduce latency when multiple requests are required

Stream output from the predictor as it's generated

When will this feature become available?

Nov 03 '20 17:11 da-source

@da-source we haven't scheduled this one yet; we usually plan about two weeks at a time.

Would it be possible to change your API implementation so that you can make a single HTTP request to the API (or multiple distinct requests if necessary), rather than relying on streaming the results?

Nov 03 '20 22:11 deliahu

@da-source we haven't scheduled this one yet; we usually plan about two weeks at a time.

Would it be possible to change your API implementation so that you can make a single HTTP request to the API (or multiple distinct requests if necessary), rather than relying on streaming the results?

I would like to deploy a large finetuned GPT-2 model. Since it is so large it takes a while to get the whole output and I would like to stream partial outputs instead of waiting for the whole thing. Something like AI Dungeon 2

Nov 04 '20 13:11 da-source

@da-source we haven't scheduled this one yet; we usually plan about two weeks at a time.

Would it be possible to change your API implementation so that you can make a single HTTP request to the API (or multiple distinct requests if necessary), rather than relying on streaming the results?

I’m using compressed* large GPT-2 model: https://bellard.org/nncp/gpt2tc.html

Nov 04 '20 15:11 da-source

Motivation

Reduce latency when multiple requests are required

Stream output from the predictor as it's generated

Hi! Are there any updates on when this will be coming out?

Nov 06 '20 19:11 AbbeKamalov

@mutal we haven't come up with a timeline for it yet. We'll keep this ticket updated as we go along. Is this urgent to you? And to re-iterate what @deliahu has mentioned before, we usually plan about two weeks at a time.

Nov 07 '20 03:11 RobertLucian

@mutal we haven't come up with a timeline for it yet. We'll keep this ticket updated as we go along. Is this urgent to you? And to re-iterate what @deliahu has mentioned before, we usually plan about two weeks at a time.

I was hoping to implement a project with scalable infrastructure and websockets this month, so it would be nice if you could add this feature as soon as possible.

Nov 07 '20 06:11 da-source

@mutal we haven't come up with a timeline for it yet. We'll keep this ticket updated as we go along. Is this urgent to you? And to re-iterate what @deliahu has mentioned before, we usually plan about two weeks at a time.

It would be very helpful for me if this feature would become available. When you say two weeks at a time, does that mean you plan to add it the week after the next one?

Nov 07 '20 15:11 AbbeKamalov

@mutal @da-source It appears that you have some urgency with regards to this feature.

Unfortunately, this feature is not a priority for Cortex for the next few weeks.

If I were in your position and wanted to ship something in the next month or so, I would try the workaround suggested here to use Cortex for your project.

Feel free to watch for notifications on this ticket. When the team has decided to prioritize this ticket, it will be moved from the to prioritize column to current sprint. If it remains in the to prioritize column it means that the team has decided that other features are a higher priority than this feature.

Nov 07 '20 19:11 vishalbollu

@mutal @da-source It appears that you have some urgency with regards to this feature.

Unfortunately, this feature is not a priority for Cortex for the next few weeks.

If I were in your position and wanted to ship something in the next month or so, I would try the workaround suggested here to use Cortex for your project.

Feel free to watch for notifications on this ticket. When the team has decided to prioritize this ticket, it will be moved from the to prioritize column to current sprint. If it remains in the to prioritize column it means that the team has decided that other features are a higher priority than this feature.

The workoaround that you have suggested doesn't work for me, because it means restarting the process (something I'm trying to avoid) on each call. In the meantime, I'll try to find a way to create a websockets on the Cortex instances myself. It shouldn't be too hard. Based on this, I'll have to replace localhost with the IP of the Cortex's AWS instance. Any ideas on how to get the IP of the instances which Cortex spins up?

Nov 11 '20 07:11 AbbeKamalov

+1

Nov 17 '20 18:11 imagine3D-ai

cortex cortex copied to clipboard

Support websockets in realtime APIs

Motivation

Motivation

Motivation

cortex
cortex copied to clipboard