jan icon indicating copy to clipboard operation
jan copied to clipboard

epic: Jan Context Length issues

Open hahuyhoang411 opened this issue 11 months ago • 5 comments

Goal

  • Jan needs an elegant way to deal with model context length issues

Possible Scope

  • e.g. Logic for Thread > context length?
  • e.g. User can adjust the context length to the model within model bounds
  • e.g. Can support longer context length support if model supported and hardware supported
  • e.g. Jan has adaptive context length, given GGUF or model.yaml, and hardware detection

Linked Issues

  • [ ] https://github.com/janhq/jan/issues/2193

Cortex Issue

  • [x] https://github.com/janhq/cortex.cpp/issues/1151

Original Post

Problem In some cases, users can use the model to exceed the limit of 4096 tokens (~4000 words). But we haven't implemented any solutions to handle it.

Success Criteria

  1. Have an alert that notifies users are exceed the context length
  2. We can delete the very first user message (not the system) when exceed the context length

Additional context Bug:


@imtuyethan

As discussed with @hahuyhoang411:

  • Error when thread exceeds the context length
  • Recommend users to delete message by themselves or create a new thread

Design:

https://www.figma.com/file/ytn1nRZ17FUmJHTlhmZB9f/Jan-App-(version-1)?type=design&node-id=6847-111809&mode=design&t=ErX19MBkMjVhBSjO-4

Screenshot 2024-03-27 at 3 59 07 PM

(This is the MVP for now, in the future we will have a standardized error format that will direct users to Discourse forum & users can see the answer there, see specs: https://www.notion.so/jan-ai/Standardized-Error-Format-for-Jan-abea56d32d6648bb8c6835f9176f800c?pvs=4)

hahuyhoang411 avatar Mar 12 '24 02:03 hahuyhoang411

Will this issue be improved? 4000 is too few conversations

lv333ming avatar Mar 15 '24 03:03 lv333ming

As discussed with @hahuyhoang411:

  • Error when thread exceeds the context length
  • Recommend users to delete message by themselves or create a new thread

Design:

https://www.figma.com/file/ytn1nRZ17FUmJHTlhmZB9f/Jan-App-(version-1)?type=design&node-id=6847-111809&mode=design&t=ErX19MBkMjVhBSjO-4

Screenshot 2024-03-27 at 3 59 07 PM

(This is the MVP for now, in the future we will have a standardized error format that will direct users to Discourse forum & users can see the answer there, see specs: https://www.notion.so/jan-ai/Standardized-Error-Format-for-Jan-abea56d32d6648bb8c6835f9176f800c?pvs=4)

imtuyethan avatar Mar 27 '24 09:03 imtuyethan

How about a 'sliding window' that only uses the last X messages that fit in the context length? The number of evaluated (prompt) and generated tokens are reported after every call, so the data is there. If the last inference evaluated+generated tokens comes close to the max context, you need to start excluding the first turn.

Propheticus avatar Apr 18 '24 09:04 Propheticus

I do not know if there are best practices regarding this but I'd just suggest to maybe not exclude the very first message as I believe most users set the stage with the first message. I could imagine there being some sort of placeholder put in between the first and the next query, when excluding message(s), like 'There have been messages in between these ones, that have been removed due to a moving context length window. Pretend this bit makes sense but disregard it as context going forward.'

IngEyn avatar Apr 24 '24 19:04 IngEyn

inspiration from the competition: image

Propheticus avatar Apr 24 '24 20:04 Propheticus