llm icon indicating copy to clipboard operation
llm copied to clipboard

API footgun: `infer_next_token` still works after end of text

Open philpax opened this issue 2 years ago • 2 comments

In llamacord, I have some logic that calls infer_next_token in a loop. Unfortunately, I didn't check for EOT - so the code would keep generating tokens and producing (fascinatingly well-structured) garbage. I think we should probably check if the last token is EOT and return an error? If you feed it a prompt, the EOT would no longer be the last token, and you should be able to infer without issues. (I wonder if A[EOT]B infers differently to AB...)

philpax avatar Mar 19 '23 01:03 philpax

I'd say being able to infer beyond eot is a feature some might want, even if it's just to run some experiment to see what would happen. But I'm OK with making the API harder to misuse, as long as it's still possible to request inference for a new token after EOT.

Returning some sort of "EOT" error in infer_next_token sounds like it might be what we want?

I wonder if A[EOT]B infers differently to AB...

Yes, for sure. EOT is an important token and the transformer will interpret both strings completely differently :+1:

setzer22 avatar Mar 20 '23 20:03 setzer22

Yeah, that's what I was thinking. It wouldn't allow your use-case, though: if you returned an EOT error, you wouldn't be able to get the inferred token back.

philpax avatar Mar 20 '23 23:03 philpax

Discussed this on the Discord - it makes much more sense to return the EOT token itself as an error, and then let users continue inferring if they want.

philpax avatar Mar 26 '23 18:03 philpax