franz101
franz101
Sure thing: https://colab.research.google.com/drive/1wFO7-A3iRMwjcLDi_W3cSniBu1lTid6T?usp=sharing You will see only the normal model works here
thanks for reproducing it, basically the quantized version would output the content and the smarter model the actual function call :D
``` ChatCompletionMessage(content='', refusal=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_exfl6bms', function=Function(arguments='{"text":"world"}', name='type'), type='function')]) ```
yes hasty mistake: print(response.choices[0].message.content.strip())
System prompt: `You are an assistant that always performs a function call`
My personal workaround for similar data is to use the wikipedia api: [see python wikipedia api for page views](https://substack.com/@franz101/p-148522892)
Amazing PR! Is there benchmark to compare the two versions?
how about streaming?
@xzkostyan mind approving this?
How do I set `allow_experimental_dynamic_type = 1` in the test?