cortex icon indicating copy to clipboard operation
cortex copied to clipboard

Support websockets in realtime APIs

Open deliahu opened this issue 5 years ago • 11 comments
trafficstars

Motivation

  • Reduce latency when multiple requests are required
  • Stream output from the predictor as it's generated

deliahu avatar Oct 29 '20 18:10 deliahu

Motivation

  • Reduce latency when multiple requests are required
  • Stream output from the predictor as it's generated

When will this feature become available?

da-source avatar Nov 03 '20 17:11 da-source

@da-source we haven't scheduled this one yet; we usually plan about two weeks at a time.

Would it be possible to change your API implementation so that you can make a single HTTP request to the API (or multiple distinct requests if necessary), rather than relying on streaming the results?

deliahu avatar Nov 03 '20 22:11 deliahu

@da-source we haven't scheduled this one yet; we usually plan about two weeks at a time.

Would it be possible to change your API implementation so that you can make a single HTTP request to the API (or multiple distinct requests if necessary), rather than relying on streaming the results?

I would like to deploy a large finetuned GPT-2 model. Since it is so large it takes a while to get the whole output and I would like to stream partial outputs instead of waiting for the whole thing. Something like AI Dungeon 2

da-source avatar Nov 04 '20 13:11 da-source

@da-source we haven't scheduled this one yet; we usually plan about two weeks at a time.

Would it be possible to change your API implementation so that you can make a single HTTP request to the API (or multiple distinct requests if necessary), rather than relying on streaming the results?

I’m using compressed* large GPT-2 model: https://bellard.org/nncp/gpt2tc.html

da-source avatar Nov 04 '20 15:11 da-source

Motivation

  • Reduce latency when multiple requests are required
  • Stream output from the predictor as it's generated

Hi! Are there any updates on when this will be coming out?

AbbeKamalov avatar Nov 06 '20 19:11 AbbeKamalov

@mutal we haven't come up with a timeline for it yet. We'll keep this ticket updated as we go along. Is this urgent to you? And to re-iterate what @deliahu has mentioned before, we usually plan about two weeks at a time.

RobertLucian avatar Nov 07 '20 03:11 RobertLucian

@mutal we haven't come up with a timeline for it yet. We'll keep this ticket updated as we go along. Is this urgent to you? And to re-iterate what @deliahu has mentioned before, we usually plan about two weeks at a time.

I was hoping to implement a project with scalable infrastructure and websockets this month, so it would be nice if you could add this feature as soon as possible.

da-source avatar Nov 07 '20 06:11 da-source

@mutal we haven't come up with a timeline for it yet. We'll keep this ticket updated as we go along. Is this urgent to you? And to re-iterate what @deliahu has mentioned before, we usually plan about two weeks at a time.

It would be very helpful for me if this feature would become available. When you say two weeks at a time, does that mean you plan to add it the week after the next one?

AbbeKamalov avatar Nov 07 '20 15:11 AbbeKamalov

@mutal @da-source It appears that you have some urgency with regards to this feature.

Unfortunately, this feature is not a priority for Cortex for the next few weeks.

If I were in your position and wanted to ship something in the next month or so, I would try the workaround suggested here to use Cortex for your project.

Feel free to watch for notifications on this ticket. When the team has decided to prioritize this ticket, it will be moved from the to prioritize column to current sprint. If it remains in the to prioritize column it means that the team has decided that other features are a higher priority than this feature.

vishalbollu avatar Nov 07 '20 19:11 vishalbollu

@mutal @da-source It appears that you have some urgency with regards to this feature.

Unfortunately, this feature is not a priority for Cortex for the next few weeks.

If I were in your position and wanted to ship something in the next month or so, I would try the workaround suggested here to use Cortex for your project.

Feel free to watch for notifications on this ticket. When the team has decided to prioritize this ticket, it will be moved from the to prioritize column to current sprint. If it remains in the to prioritize column it means that the team has decided that other features are a higher priority than this feature.

The workoaround that you have suggested doesn't work for me, because it means restarting the process (something I'm trying to avoid) on each call. In the meantime, I'll try to find a way to create a websockets on the Cortex instances myself. It shouldn't be too hard. Based on this, I'll have to replace localhost with the IP of the Cortex's AWS instance. Any ideas on how to get the IP of the instances which Cortex spins up?

AbbeKamalov avatar Nov 11 '20 07:11 AbbeKamalov

+1

imagine3D-ai avatar Nov 17 '20 18:11 imagine3D-ai