semantic-kernel icon indicating copy to clipboard operation
semantic-kernel copied to clipboard

Chat completion support

Open NTaylorMullen opened this issue 2 years ago • 12 comments

I've been digging through the IKernel and function abstractions hoping to find a way to enable gpt-3.5-turbo APIs (chat completion) and more recently GPT-4 APIs but given ITextCompletion only takes a string as input I haven't found a way to reasonably change the bits to enable the new behavior.

NTaylorMullen avatar Mar 14 '23 21:03 NTaylorMullen

Thanks for the note @NTaylorMullen! Since ChatGPT introduces a new API, we have to implement a ChatCompletition API in the Kernel. We have this on our backlog and have bumped up the priority!

@shawncal @dluc ^

alexchaomander avatar Mar 15 '23 00:03 alexchaomander

One approach to this that might work well would be to support defining prompts using OpenAI's new ChatML syntax and then have SK parse this before calling the Chat Completion API's... The Chat Completion API's currently just convert the JSON you pass them back into a ChatML based prompt so this would essentially let send almost any ChatML based prompt through the Chat Completion API's. They've said that a way to send raw ChatML is coming but not here yet...

To go along with this you would need a {{$history}} variable that formats conversation history using ChatML. so maybe {{$historyML}} or a function to convert the pairs {{historyML}} into ChatML format.

This is actually the ONLY technique I've thought of that would allow multi-shot prompts to work correctly with the new Chat Completion API's. The issue with multi-shot prompts and Chat Completion is that each shot needs to be passed in as user/assistant message pair to work, so you either need a way outside of the prompt to construct those pairs (doesn't seem like SK is setup to do that) or you need to create a single prompt will all those pairs and use ChatML to separate them,

Stevenic avatar Mar 15 '23 00:03 Stevenic

Another tip I'll give you, for gpt-3.5-turbo at least, is that I would avoid sending "system" messages all together. The model will very quickly abandon them and I've gotten far better results by just including an extra "user" message containing the core system prompt.

Stevenic avatar Mar 15 '23 00:03 Stevenic

These are great tips! Thanks for sending them @Stevenic!

alexchaomander avatar Mar 15 '23 21:03 alexchaomander

Using GPT turbo is reasonably simple using a connector. I think most of the friction is about persisting the chat history object inside the context, with a continuous serialization/deserialization, which is not ideal but should do the trick.

dluc avatar Mar 17 '23 05:03 dluc

Using GPT turbo is reasonably simple using a connector. I think most of the friction is about persisting the chat history object inside the context, with a continuous serialization/deserialization, which is not ideal but should do the trick.

Mind elaborating on how to use a connector here? Or were you referring to internal to SK?

NTaylorMullen avatar Mar 17 '23 17:03 NTaylorMullen

@NTaylorMullen , here is a PR in right now for Python with the Chat APIs. Would this work to unblock you for now?

evchaki avatar Mar 21 '23 21:03 evchaki

@NTaylorMullen , here is a PR in right now for Python with the Chat APIs. Would this work to unblock you for now?

Sadly not, we're only using the C# APIs 😢

NTaylorMullen avatar Mar 21 '23 21:03 NTaylorMullen

As an FYI in my JS implementation (SK like but not exactly SK) I'm doing basically what @dluc suggests... I'm using a $history variable to hold the message pairs and then I parse this $history variable to reconstruct the user/assistant message pairs in my connector. Just keep in mind that your $history could have new lines \n so you'll need to account for that if parsing text. My $history object is a string array of pairs so I don't have to deal with that but I believe in C# everything is strings.

Stevenic avatar Mar 22 '23 00:03 Stevenic

Another tip I'll give you, for gpt-3.5-turbo at least, is that I would avoid sending "system" messages all together. The model will very quickly abandon them and I've gotten far better results by just including an extra "user" message containing the core system prompt.

The "system" message is usefull to prevent Prompt Injection. It's also enable prompting in the context of the system.

Moult-ux avatar Mar 24 '23 13:03 Moult-ux

I've got gpt-4 running via SK in C# (I'm building a Teams bot). However, it has no message memory or token handling yet & I've also still got to add tests.

However, before I go to far with this, I thought I should check in here to get some feedback on the implementation.

Please see this PR For more detail.

SOE-YoungS avatar Mar 24 '23 22:03 SOE-YoungS

quick update: work is in progress, here's the pull request adding ChatGPT and DallE: https://github.com/microsoft/semantic-kernel/pull/161

dluc avatar Mar 25 '23 22:03 dluc

This got merged in! Closing this issue

alexchaomander avatar Mar 28 '23 22:03 alexchaomander