llm icon indicating copy to clipboard operation
llm copied to clipboard

Mechanism for injecting "fake" responses from models into the conversation

Open simonw opened this issue 10 months ago • 1 comments

https://twitter.com/simonw/status/1694089359514104094

A useful trick is sometimes to feed a model a prior conversation that includes things that the model didn't actually say - things like "Sure, I'd be happy to help you with that" for minor jailbreaks.

Not sure what the CLI options for this should look like, or how they should be recorded in the SQLite database logs.

simonw avatar Aug 22 '23 20:08 simonw