llm Mechanism for injecting "fake" responses from models into the conversation

Mechanism for injecting "fake" responses from models into the conversation

Open simonw opened this issue 10 months ago • 1 comments

https://twitter.com/simonw/status/1694089359514104094

A useful trick is sometimes to feed a model a prior conversation that includes things that the model didn't actually say - things like "Sure, I'd be happy to help you with that" for minor jailbreaks.

Not sure what the CLI options for this should look like, or how they should be recorded in the SQLite database logs.

Aug 22 '23 20:08 simonw

llm llm copied to clipboard

Mechanism for injecting "fake" responses from models into the conversation

llm
llm copied to clipboard