semantic-kernel Chat completion support

I've been digging through the IKernel and function abstractions hoping to find a way to enable gpt-3.5-turbo APIs (chat completion) and more recently GPT-4 APIs but given ITextCompletion only takes a string as input I haven't found a way to reasonably change the bits to enable the new behavior.

Mar 14 '23 21:03 NTaylorMullen

Thanks for the note @NTaylorMullen! Since ChatGPT introduces a new API, we have to implement a ChatCompletition API in the Kernel. We have this on our backlog and have bumped up the priority!

@shawncal @dluc ^

Mar 15 '23 00:03 alexchaomander

One approach to this that might work well would be to support defining prompts using OpenAI's new ChatML syntax and then have SK parse this before calling the Chat Completion API's... The Chat Completion API's currently just convert the JSON you pass them back into a ChatML based prompt so this would essentially let send almost any ChatML based prompt through the Chat Completion API's. They've said that a way to send raw ChatML is coming but not here yet...

To go along with this you would need a {{$history}} variable that formats conversation history using ChatML. so maybe {{$historyML}} or a function to convert the pairs {{historyML}} into ChatML format.

This is actually the ONLY technique I've thought of that would allow multi-shot prompts to work correctly with the new Chat Completion API's. The issue with multi-shot prompts and Chat Completion is that each shot needs to be passed in as user/assistant message pair to work, so you either need a way outside of the prompt to construct those pairs (doesn't seem like SK is setup to do that) or you need to create a single prompt will all those pairs and use ChatML to separate them,

Mar 15 '23 00:03 Stevenic

Another tip I'll give you, for gpt-3.5-turbo at least, is that I would avoid sending "system" messages all together. The model will very quickly abandon them and I've gotten far better results by just including an extra "user" message containing the core system prompt.

Mar 15 '23 00:03 Stevenic

These are great tips! Thanks for sending them @Stevenic!

Mar 15 '23 21:03 alexchaomander

Using GPT turbo is reasonably simple using a connector. I think most of the friction is about persisting the chat history object inside the context, with a continuous serialization/deserialization, which is not ideal but should do the trick.

Mar 17 '23 05:03 dluc

Using GPT turbo is reasonably simple using a connector. I think most of the friction is about persisting the chat history object inside the context, with a continuous serialization/deserialization, which is not ideal but should do the trick.

Mind elaborating on how to use a connector here? Or were you referring to internal to SK?

Mar 17 '23 17:03 NTaylorMullen

@NTaylorMullen , here is a PR in right now for Python with the Chat APIs. Would this work to unblock you for now?

Mar 21 '23 21:03 evchaki

@NTaylorMullen , here is a PR in right now for Python with the Chat APIs. Would this work to unblock you for now?

Sadly not, we're only using the C# APIs 😢

Mar 21 '23 21:03 NTaylorMullen

As an FYI in my JS implementation (SK like but not exactly SK) I'm doing basically what @dluc suggests... I'm using a $history variable to hold the message pairs and then I parse this $history variable to reconstruct the user/assistant message pairs in my connector. Just keep in mind that your $history could have new lines \n so you'll need to account for that if parsing text. My $history object is a string array of pairs so I don't have to deal with that but I believe in C# everything is strings.

Mar 22 '23 00:03 Stevenic

Another tip I'll give you, for gpt-3.5-turbo at least, is that I would avoid sending "system" messages all together. The model will very quickly abandon them and I've gotten far better results by just including an extra "user" message containing the core system prompt.

The "system" message is usefull to prevent Prompt Injection. It's also enable prompting in the context of the system.

Mar 24 '23 13:03 Moult-ux

I've got gpt-4 running via SK in C# (I'm building a Teams bot). However, it has no message memory or token handling yet & I've also still got to add tests.

However, before I go to far with this, I thought I should check in here to get some feedback on the implementation.

Please see this PR For more detail.

Mar 24 '23 22:03 SOE-YoungS

quick update: work is in progress, here's the pull request adding ChatGPT and DallE: https://github.com/microsoft/semantic-kernel/pull/161

Mar 25 '23 22:03 dluc

This got merged in! Closing this issue

Mar 28 '23 22:03 alexchaomander