Inconsistent dictionary structure in autogen.ChatCompletion.logged_history
#179 added much needed support for tracking token count and cost (thanks @kevin666aa ). However, there is some unexpected/inconsistent structure in the dictionary returned.
Currently autogen.ChatCompletion.start_logging(compact=True) is used to start a logging session and ends with autogen.ChatCompletion.stop_logging(). Next the logs can be accessed via autogen.ChatCompletion.logged_history.
Unexpected Structure in autogen.ChatCompletion.logged_history when compact=True
When compact is set to True, logged_history is a dictionary. However, the key is the entire chat history
{
"""
[
{
'role': 'system',
'content': system_message,
}, ...
]""": {
"created_at": [0, 1],
"cost": [0.1, 0.2],
}
}
This makes it very challenging to reuse this data structure in apps. It might be valuable to have output with some structured keys.
{
"messages": [..]
"created_at": []
"cost": ...
}
Further more, the structure of logged_history is significantly different when compact=False
{0: {'request': {'messages': [{'content': 'Y.. ',
'role': 'system'},
],
'model': 'gpt-4',
'temperature': 0,
'api_key': '..'},
'response': {'id': 'chatcmpl-...Y',
'object': 'chat.completion',
'created': 1698203546,
'model': 'gpt-4-0613',
'choices': [{'index': 0,
'message': {'role': 'assistant',
'content': 'Yes,..'},
'finish_reason': 'stop'}],
'usage': {'prompt_tokens': 689,
'completion_tokens': 65,
'total_tokens': 754},
'cost': 0.024569999999999998}}}
Potential action items
- Improve the key value for logged history compact=True
- Unify the data data structure across both compact=True and compact=False.
Happy to get more thoughts here @gagb @afourney @pcdeadeasy
Related .
Documentation here may need an update.
These are great observations. I've mainly only been using compact=False in the Testbed, since the intention is to log as much as is possible.
Even in verbose logging, it's weird to have a dictionary instead of a list, unless we're expecting it to be sparse at some point? Basically it means we need to be a little careful when iterating through the items... there's no guarantee they will be sequential in this structure.
@victordibia, all your observations are true. It will be good to define/generate a JSON Schema for logging messages and use it consistently for messages. Having the whole message as a key is indeed an unusual choice. So, I agree with your proposal to make this consistent irrespective of the flag.
Yes, the dictionary returned with compact = True or False is quite different, and I did spend some time to make them consistent to and the func print_usage_summary. It would be great if there is a consistent and easy-to-use structure.
Thanks @kevin666aa . It looks like the new changes driven by updates to the openai lib will be relevant here #203, #7 . I will revisit this when #203 is complete.
I managed to find a work around that seems to work for me, e.g.:
conversations = dict()
autogen.ChatCompletion.start_logging(conversations)
# ... do AutoGen stuff...
# extract content from an entry (in the conversations list):
get_content_list = lambda conversations: list(conversations)[0]
content_list = get_content_list(conversations)
content_list_json = json.loads(content_list)
# get content of all conversations:
conversations_content = "\n".join(conversation.get('content') for conversation in content_list_json)
Quite crazy having to do this kind of stuff... I hope it will be fixed soon!
seems super stale and a workaround is posted. closing won't fix