llm
llm copied to clipboard
API footgun: `infer_next_token` still works after end of text
In llamacord, I have some logic that calls infer_next_token in a loop. Unfortunately, I didn't check for EOT - so the code would keep generating tokens and producing (fascinatingly well-structured) garbage. I think we should probably check if the last token is EOT and return an error? If you feed it a prompt, the EOT would no longer be the last token, and you should be able to infer without issues. (I wonder if A[EOT]B infers differently to AB...)
I'd say being able to infer beyond eot is a feature some might want, even if it's just to run some experiment to see what would happen. But I'm OK with making the API harder to misuse, as long as it's still possible to request inference for a new token after EOT.
Returning some sort of "EOT" error in infer_next_token sounds like it might be what we want?
I wonder if A[EOT]B infers differently to AB...
Yes, for sure. EOT is an important token and the transformer will interpret both strings completely differently :+1:
Yeah, that's what I was thinking. It wouldn't allow your use-case, though: if you returned an EOT error, you wouldn't be able to get the inferred token back.
Discussed this on the Discord - it makes much more sense to return the EOT token itself as an error, and then let users continue inferring if they want.