llm
llm copied to clipboard
Mechanism for injecting "fake" responses from models into the conversation
https://twitter.com/simonw/status/1694089359514104094
A useful trick is sometimes to feed a model a prior conversation that includes things that the model didn't actually say - things like "Sure, I'd be happy to help you with that" for minor jailbreaks.
Not sure what the CLI options for this should look like, or how they should be recorded in the SQLite database logs.