promptflow [BUG] Running Prompt flow locally produces errors

Describe the bug Running a flow using pf test seems to work but exceptions are reported locally.

How To Reproduce the bug Run pf test on a flow locally. The flow executes successfully but exceptions are generated when collecting token metrics for openai.

Expected behavior A clean run without exceptions

Screenshots

WARNING:opentelemetry.attributes:Invalid type NoneType for attribute 'computed.cumulative_token_count.completion' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types WARNING:opentelemetry.attributes:Invalid type NoneType for attribute 'llm.usage.completion_tokens_details' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types WARNING:opentelemetry.attributes:Invalid type NoneType for attribute 'computed.cumulative_token_count.completion' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types

Running Information(please complete the following information): { "promptflow": "1.15.0", "promptflow-azure": "1.15.0", "promptflow-core": "1.15.0", "promptflow-devkit": "1.15.0", "promptflow-tracing": "1.15.0" }

Executable 'c:\git\azure-ai-prompt-flow.venv\Scripts\python.exe' Python (Windows) 3.10.5 (tags/v3.10.5:f377153, Jun 6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)] Additional context Seems like the exception is in here

    def collect_openai_tokens_for_parent_span(self, span):
        tokens = self.try_get_openai_tokens(span.get_span_context().span_id)
        if tokens:
            if not hasattr(span, "parent") or span.parent is None:
                return
            parent_span_id = span.parent.span_id
            with self._lock:
                if parent_span_id in self._span_id_to_tokens:
                    merged_tokens = {
                        key: self._span_id_to_tokens[parent_span_id].get(key, 0) + tokens.get(key, 0)
                        for key in set(self._span_id_to_tokens[parent_span_id]) | set(tokens)
                    }
                    self._span_id_to_tokens[parent_span_id] = merged_tokens
                else:
                    self._span_id_to_tokens[parent_span_id] = tokens

On the line key: self._span_id_to_tokens[parent_span_id].get(key, 0) + tokens.get(key, 0) . Not all steps in my flow are LLM steps. I wonder if this could be causing the issue where we have some steps in the flow that don't have any tokens.

Sep 13 '24 15:09 rlerch

We see something similar, however in our case it is an error and failing the execution:

File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/promptflow/tracing/_trace.py", line 144, in <dictcomp>
key: self._span_id_to_tokens[parent_span_id].get(key, 0) + tokens.get(key, 0)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for +: 'NoneType' and 'NoneType'

Sep 13 '24 17:09 berndku

Seeing similar errors on my end, this is halting flow execution:

2024-09-13 14:04:35 -0400 46456 execution.flow WARNING Failed to calculate metrics due to exception: unsupported operand type(s) for +: 'int' and 'dict'.

Sep 13 '24 18:09 jwieler

Same issue, resulting in flow being terminated: WARNING:opentelemetry.attributes:Invalid type dict for attribute '__computed__.cumulative_token_count.completion' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types 2024-09-13 16:53:20 -0400 28812 execution.flow WARNING Failed to calculate metrics due to exception: unsupported operand type(s) for +: 'int' and 'dict'. 2024-09-13 16:53:20 -0400 28812 execution.flow ERROR Flow execution has failed. Cancelling all running nodes: extract_data.

Sep 13 '24 21:09 mack-adknown

Some keys in the tokens have values that are not int, causing the issue.

azure_open_ai: tokens = {'completion_tokens': 13, 'prompt_tokens': 673, 'total_tokens': 686, 'completion_tokens_details': None} open_ai: {'completion_tokens': 13, 'prompt_tokens': 669, 'total_tokens': 682, 'completion_tokens_details': {'reasoning_tokens': 0}}

As a temporary solution, tracing can be disabled by setting PF_DISABLE_TRACING=true.

Sep 14 '24 05:09 SULAPIS

Rolling back the version of openai also works, openai<=1.44.1 resolves the error. Looks like promptflow-tracing could be incompatible with 1.45.0 https://pypi.org/project/openai/#history

Sep 15 '24 09:09 asos-oliverfrost

@asos-oliverfrost thank you for finding that! Rolling back worked for me. It looks like this is the specific line that is breaking the behavior in prompt flow, because the new field completion_token_details is optional: https://github.com/openai/openai-python/compare/v1.44.1...v1.45.0#diff-d85f41ac9f419751206af46c34ef5c8c74258660be492aa703dcbebcfc96a41bR25

Sep 16 '24 01:09 jomalsan

Rolling back OpenAI did not work for me, Pydantic is so dynamic that I still get completion_tokens_details={'reasoning_tokens': 0}. In summary:

An OpenAI endpoint returns a dictionary inside another dictionary: {"completion_token_details":{"foo_bar":0}}.
Then it's processed by promflow.tracing, which does not support that.

Sep 19 '24 15:09 olopezqubika

Also having this issue, with similar errors:

2024-09-19 18:05:38 -0700   89862 execution          ERROR    Node extract_result in line 0 failed. Exception: Execution failure in 'extract_result': (TypeError) unsupported operand type(s) for +: 'dict' and 'dict'.
Traceback (most recent call last):
  File "/[...]/pypoetry/virtualenvs/promptflow-test-gto235yr-py3.11/lib/python3.11/site-packages/promptflow/_core/flow_execution_context.py", line 182, in _invoke_tool_inner
    return f(**kwargs)
           ^^^^^^^^^^^
  File "/[...]/pypoetry/virtualenvs/promptflow-test-gto235yr-py3.11/lib/python3.11/site-packages/promptflow/tracing/_trace.py", line 561, in wrapped
    token_collector.collect_openai_tokens_for_parent_span(span)
  File "/[...]/pypoetry/virtualenvs/promptflow-test-gto235yr-py3.11/lib/python3.11/site-packages/promptflow/tracing/_trace.py", line 143, in collect_openai_tokens_for_parent_span
    merged_tokens = {
                    ^
  File "/[...]/pypoetry/virtualenvs/promptflow-test-gto235yr-py3.11/lib/python3.11/site-packages/promptflow/tracing/_trace.py", line 144, in <dictcomp>
    key: self._span_id_to_tokens[parent_span_id].get(key, 0) + tokens.get(key, 0)
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for +: 'dict' and 'dict'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/[...]/pypoetry/virtualenvs/promptflow-test-gto235yr-py3.11/lib/python3.11/site-packages/promptflow/_core/flow_execution_context.py", line 90, in invoke_tool
    result = self._invoke_tool_inner(node, f, kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/[...]/pypoetry/virtualenvs/promptflow-test-gto235yr-py3.11/lib/python3.11/site-packages/promptflow/_core/flow_execution_context.py", line 206, in _invoke_tool_inner
    raise ToolExecutionError(node_name=node_name, module=module) from e
promptflow._core._errors.ToolExecutionError: Execution failure in 'extract_result': (TypeError) unsupported operand type(s) for +: 'dict' and 'dict'
2024-09-19 18:05:38 -0700   89862 execution.flow     WARNING  Failed to calculate metrics due to exception: unsupported operand type(s) for +: 'int' and 'dict'.
2024-09-19 18:05:38 -0700   89862 execution.flow     ERROR    Flow execution has failed. Cancelling all running nodes: extract_result.
pf.flow.test failed with UserErrorException: TypeError: Execution failure in 'extract_result': (TypeError) unsupported operand type(s) for +: 'dict' and 'dict'

This happens when running a version of the chat-math-variant example, edited so that extract_text.py calls the OpenAI ChatCompletions endpoint. One of my coworkers is also seeing this error, seemingly from an LLM tool call.

Sep 20 '24 01:09 cfoster0

Rolling back the version of openai also works, openai<=1.44.1 resolves the error. Looks like promptflow-tracing could be incompatible with 1.45.0 https://pypi.org/project/openai/#history

This worked for me

Sep 25 '24 05:09 mallapraveen

oh. @mallapraveen @jomalsan YOu saved me. Thank you millions.

Sep 26 '24 01:09 HakjunMIN

Traceback (most recent call last): File "/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/_core/flow_execution_context.py", line 182, in _invoke_tool_inner return f(**kwargs) File "/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/tracing/_trace.py", line 561, in wrapped token_collector.collect_openai_tokens_for_parent_span(span) File "/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/tracing/_trace.py", line 143, in collect_openai_tokens_for_parent_span merged_tokens = { File "/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/tracing/_trace.py", line 144, in key: self._span_id_to_tokens[parent_span_id].get(key, 0) + tokens.get(key, 0) TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

Getting this error.

Sep 27 '24 07:09 JatinGargQA

I'm able to run locally, but I get TypeError: unsupported operand type(s) for +: 'int' and 'NoneType' after deploying using Azure AI Studio

Sep 27 '24 17:09 matheuslazarin

Rolling back the version of openai also works, openai<=1.44.1 resolves the error. Looks like promptflow-tracing could be incompatible with 1.45.0 https://pypi.org/project/openai/#history

This did the trick for me !! thank you so much <3

Sep 29 '24 16:09 oussamachiboub

I have found PR that may be relevant to this issue.

https://github.com/microsoft/promptflow/pull/3793

Oct 01 '24 00:10 hiroki0525

I'm also noticing this issue recently (was working a month ago) when deploying prompt flow from Azure AI Studio, created from Chat playground.

Oct 02 '24 00:10 teebu

We encountered the problem as well. Adding the following lines to the promptflow YAML worked:

environment_variables:
  PF_DISABLE_TRACING: true

Oct 02 '24 07:10 JanWerder

As @JanWerder suggested this did the trick for me

environment_variables:
  PF_DISABLE_TRACING: true

Oct 02 '24 17:10 espanto28

This is fixed with promptflow-tracing 1.16.1

Oct 09 '24 07:10 luigiw

@luigiw It does not seem that this fix works with the base docker promptflow runtime image. As the image is still based on Python 3.9

Oct 18 '24 01:10 WilliamWatsonEAI

@luigiw after updating to 1.16.1 the error goes away, and that is good, but a lot of warning still persists:

WARNING:opentelemetry.attributes:Invalid type NoneType for attribute '__computed__.cumulative_token_count.completion' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types WARNING:opentelemetry.attributes:Invalid type NoneType for attribute '__computed__.cumulative_token_count.prompt' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types WARNING:opentelemetry.attributes:Invalid type NoneType for attribute 'llm.usage.completion_tokens_details' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types WARNING:opentelemetry.attributes:Invalid type NoneType for attribute 'llm.usage.prompt_tokens_details' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types WARNING:opentelemetry.attributes:Invalid type NoneType for attribute '__computed__.cumulative_token_count.completion' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types WARNING:opentelemetry.attributes:Invalid type NoneType for attribute '__computed__.cumulative_token_count.prompt' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types

And what I notice on AppInsights if that the logs for llm tools shows token consumptions (completion, prompt and total); but the "same" metrics for te whole flow shows only cosumption token for total token and "0" for the other two:

Leaf tool nod:

1 level up:

root:

Oct 23 '24 08:10 scherici

@luigiw after updating to 1.16.1 the error goes away, and that is good, but a lot of warning still persists:

WARNING:opentelemetry.attributes:Invalid type NoneType for attribute '__computed__.cumulative_token_count.completion' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types WARNING:opentelemetry.attributes:Invalid type NoneType for attribute '__computed__.cumulative_token_count.prompt' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types WARNING:opentelemetry.attributes:Invalid type NoneType for attribute 'llm.usage.completion_tokens_details' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types WARNING:opentelemetry.attributes:Invalid type NoneType for attribute 'llm.usage.prompt_tokens_details' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types WARNING:opentelemetry.attributes:Invalid type NoneType for attribute '__computed__.cumulative_token_count.completion' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types WARNING:opentelemetry.attributes:Invalid type NoneType for attribute '__computed__.cumulative_token_count.prompt' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types

And what I notice on AppInsights if that the logs for llm tools shows token consumptions (completion, prompt and total); but the "same" metrics for te whole flow shows only cosumption token for total token and "0" for the other two:

Leaf tool nod:

1 level up:

root:

I have found PR that may be relevant to this issue: #3806

Oct 23 '24 09:10 scherici

Still getting this error service failed to complete the prompt", TypeError("unsupported operand type(s) for +: 'dict' and 'dict'")) 2024-12-11 15:13:53 -0500 36540 execution.flow WARNING Failed to calculate metrics due to exception: unsupported operand type(s) for +: 'int' and 'dict'.

versions: promptflow 1.16.2 promptflow-core 1.16.2 promptflow-devkit 1.16.2 promptflow-tracing 1.16.2

Python 3.11.7

found another similar open issue and add my issue since this issue had been closed.

Dec 11 '24 20:12 redhatpeter

@redhatpeter same issue here

Dec 27 '24 08:12 m4tej241

Is this issue now fixed?

Jan 04 '25 18:01 asos-sathyagangadharan

It seems you need you still need to do the following on the YAML file:

environment_variables:
  PF_DISABLE_TRACING: true

Jan 06 '25 17:01 espanto28

I use flex flows and was able to fix this bug by overloading the collect_openai_tokens method on promptflow.tracing._trace.token_collector. This allows me to filter to only the "completion_tokens", "prompt_tokens" and "total_tokens", which are always integer responses. With this change in place you can use the latest version of the promptflow and openai packages.

Add the following code to the top of your flex flow.

import inspect
import types

from promptflow.tracing._trace import token_collector


def collect_openai_tokens(self, span, output):
    """Override method for the promptflow.tracing._trace.TokenCollector class

    Original implementation is here: https://github.com/microsoft/promptflow/blob/40c84b46f48cc7d02b6188e244e3dd8b0dde4743/src/promptflow-tracing/promptflow/tracing/_trace.py#L111
    """
    span_id = span.get_span_context().span_id
    if (
        not inspect.isgenerator(output)
        and hasattr(output, "usage")
        and output.usage is not None
    ):
        tokens = output.usage.dict()
        if tokens:
            # Filter to only specific tokens
            tokens = {
                k: v if v is not None and type(v) is int else 0
                for k, v in tokens.items()
                if k in ["completion_tokens", "prompt_tokens", "total_tokens"]
            }
            with self._lock:
                self._span_id_to_tokens[span_id] = tokens


token_collector.collect_openai_tokens = types.MethodType(
    collect_openai_tokens, token_collector
)

Jan 06 '25 17:01 jomalsan

Also still having this issue. I don't want to give up tracing as it's extremely useful for my development. I'm using prompt flow via the yaml-based declarative approach. I have tried openai==1.44.0, promptflow-tracing==1.16.1

EDIT: Upgrading all promptflow packages to the Jan 6th release, promptflow 1.17.0, fixed the issue for me. The fix is in the release notes. https://pypi.org/project/promptflow/1.17.0/

Jan 07 '25 11:01 jackdgolding

Also still having this issue. I don't want to give up tracing as it's extremely useful for my development. I'm using prompt flow via the yaml-based declarative approach. I have tried openai==1.44.0, promptflow-tracing==1.16.1

EDIT: Upgrading all promptflow packages to the Jan 6th release, promptflow 1.17.0, fixed the issue for me. The fix is in the release notes. https://pypi.org/project/promptflow/1.17.0/

Nice it worked for me as well, after upgrading the promptflow package

Jan 07 '25 14:01 asos-sathyagangadharan

Also still having this issue. I don't want to give up tracing as it's extremely useful for my development. I'm using prompt flow via the yaml-based declarative approach. I have tried openai==1.44.0, promptflow-tracing==1.16.1

EDIT: Upgrading all promptflow packages to the Jan 6th release, promptflow 1.17.0, fixed the issue for me. The fix is in the release notes. https://pypi.org/project/promptflow/1.17.0/

I can confirm it works for me as well with promptflow 1.17.0 . :) Thanks all

Jan 08 '25 08:01 SophieBoule99

in promptflow/tracing/_openai_utils.py

 def _get_openai_metrics_for_signal_api(self, api_call: dict):
        inputs = api_call.get("inputs")
        output = api_call.get("output")
        if isinstance(output, dict):
            usage = output.get("usage")
            if isinstance(usage, dict):
                try:
                    del usage['prompt_tokens_details']
                    del usage['completion_tokens_details']
                except KeyError:
                    pass
                return usage
            self._log_warning(
                "Cannot find openai metrics in output, will calculate metrics from response data directly."
            )

This will remove the dictionary keys with composite metrics and remove the error.

Fixes the current problem but still leaves the issue of detailed metrics etc hidden, which should be visibile.

See https://cookbook.openai.com/examples/prompt_caching101 for why.

Mar 21 '25 13:03 pback-ores