Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

Allow cancellation of prediction while running prompt

Open daanrongen opened this issue 1 year ago • 13 comments

Open Assistant is great, but sometimes it will predict a long answer where I can spot a misinterpretation right away. Whether this is because my prompt was faulty and I realise this too late, or the model hallucinates. Either way, having to wait for the entire prediction to load significantly reduces the UX (due to waiting time). It would be useful to be able to abort the prediction.

Model OA_SFT_Llama_30B_6 
Top K 50 
Top P 0.95 
Temperature 0.9 
Repetition penalty 1.2 
Max new tokens 1024

daanrongen avatar Apr 21 '23 19:04 daanrongen

Note: this requires fairly deep understanding of the inference system

yk avatar Apr 21 '23 20:04 yk

the issue is similar to #2647. can this be closed?

alexn-s avatar Apr 23 '23 22:04 alexn-s

I would enjoy working on this, as it encompasses most of the stack and would give me a chance to quickly get to know the entire application. I could put together a proposal for how to implement this over the next week, then work on the implementation the week after that.

0xfacade avatar May 21 '23 21:05 0xfacade

Is this really up to OA to do this ? it looks like OA has a big dependency to https://github.com/huggingface/text-generation-inference for the inference. If we want to have a proper way of handling cancellation, we should perhaps make (or wait ...) a contribution on this repo. I can add an issue on their side and try to contribute here. In the openai api doc they don't have anything for cancellation, but they do have cancellation on their website.

Image

I wonder if this is just a UI thing (they let the prompt generation run on their server).

Forbu avatar May 23 '23 21:05 Forbu

I was thinking we could implement the cancellation as a stopping criterion that could be influenced from outside the model. The interface already supports stopping criteria. I haven't spent too much thought yet on that aspect of it, but it seems like a viable solution to me.

I currently see more issues in turning the communication into a bidrectional channel. Currently, the communication happens over server side events which are uni directional. I'm trying to find out whether it could be an option to close the server side communication as an indicator to stop generating.

0xfacade avatar May 23 '23 22:05 0xfacade

As a first step, I would implement only the backend part of this feature. Goal: stop generating tokens when the event stream is closed. We can use the exception that is raised in this case to indicate that generation should be stopped - the communication is uni-directional otherwise, so we can't really send any complex messages from the client to the server.

Necessary steps:

  • in the oasst_inference_server that communicates directly with the UI, catch the asyncio.CancelledError that indicates that the event stream was closed by the client (see example in documentation of sse-starlette)
  • propagate the cancellation by closing the stream to the worker defined in basic_hf_server.py
  • also catch the CancelledError in basic_hf_server.py; when this happens, set a flag that indicates that the inference should be stopped
  • add a stopping criterion to the model that checks if the flag is set

I've added comments to the necessary places in this MR in my fork: https://github.com/0xfacade/Open-Assistant/pull/1

Potential issue: if there's a proxy in between the UI and the server(s), that proxy would have to close the connection immediately when the client closes it. I don't think that is going to be an issue, though.

Let me know what you think. @andreaskoepf

0xfacade avatar May 25 '23 21:05 0xfacade

@0xfacade really nice analysis, thank you very much, I think it's a totally viable plan to move forward!

yk avatar May 25 '23 21:05 yk

Great, thanks. Then I'll implement this over the coming evenings. Shouldn't take long!

0xfacade avatar May 29 '23 18:05 0xfacade

What is the state of this issues ? A PRs has been merged and revert ?

axel7083 avatar Sep 24 '23 12:09 axel7083

Status update:

  • spent a lot of time analyzing how to implement this
  • merged a cleanup in preparation for the implementation, which was reverted due to an unidentified bug
  • have been working on a new implementation since, but it is going very slow and I'm loosing motivation a bit..

0xfacade avatar Oct 15 '23 12:10 0xfacade

If someone else would like to work on this, I'd be happy to share my insights and plan on how to do this.

0xfacade avatar Oct 15 '23 12:10 0xfacade

GO

DanyRay420 avatar May 25 '24 13:05 DanyRay420

Get it

DanyRay420 avatar May 25 '24 13:05 DanyRay420