llm
llm copied to clipboard
[FR] Add `--max-input-tokens`, `--input-limit-mode`
To avoid unintentionally sending requests that are too large to the new 128k-context models and incurring unnecessary costs, it's crucial to put in place some safeguards.
The parameter --max-input-tokens
should be set to establish a maximum limit on the number of input tokens accepted. In addition, --input-limit-mode
needs to specify what action is taken when this limit is reached. Possible settings for --input-limit-mode
include:
-
exit
: which halts the process; -
truncate
: which reduces the input to the permissible size; -
summarize
: which shortens the input, potentially through the use of a less expensive model; -
ask-user
: which asks the user to decide the next step.