pipelines
pipelines copied to clipboard
Backend error catching and truncate tokens
I am looking into truncating the input tokens such that it fits the model max tokens. Because I don't want to do unnecessary tokenisation on the OWUI side all the time, I am thinking to only doing it when the backend (Nvidia NIM, Triton in my case) has such specific error. Is there a way for pipelines to catch this error and do conditional tokenisation + truncation? Any recommendations are more than welcome!