semantic-kernel icon indicating copy to clipboard operation
semantic-kernel copied to clipboard

Python: New Feature: More granular spans for AutoFunctionInvocationLoop

Open HuskyDanny opened this issue 7 months ago • 6 comments

We are debugging the performance issue for the agent calling which uses the functional calling that uses the AutoFunctionInvocationLoop.

From the trace, we see the span at AutoFunctionInvocationLoop level which took a quite lot of time and below it are some tool calls.

Image

Can we add more granular spans on the AutoFunctionInvocationLoop? Specifically, I want to identify which part is taking the majority of time, I know there is functional calling to decide which tool to call, can we have this span in as well?

Overall, I feel this is important to diagnose agent performance issue.

The SDK I am using is semantic-kernel==1.28.0 in python

HuskyDanny avatar May 15 '25 09:05 HuskyDanny

@TaoChenOSU what are your thoughts on this enhancement request?

markwallace-microsoft avatar May 17 '25 13:05 markwallace-microsoft

@TaoChenOSU Any thoughts on this?

HuskyDanny avatar May 19 '25 09:05 HuskyDanny

Hi @HuskyDanny,

Thank you for raising the question!

It looks like you are not capturing the model invocation spans which probably take up the most time. If you have the model invocation spans captured, you will see something like this:

Image

To capture the model invocation spans, please refer to this documentation.

TaoChenOSU avatar Jun 06 '25 16:06 TaoChenOSU

The kernel function completes in 0.5 seconds but AutoFunctionInvocationLoop takes almost 21 seconds. Is there any ways to know which span takes so much time? Or is there way to reduce the time in AutoFunctionInvocationLoop? { "name": "AutoFunctionInvocationLoop", "context": { "trace_id": "0x2c1094153c3994302defbe4821436a7f", "span_id": "0x0221d7d52fb74bd5", "trace_state": "[]" }, "kind": "SpanKind.INTERNAL", "parent_id": "0x3cd982409b8f92a7", "start_time": "2025-10-24T17:18:12.830324Z", "end_time": "2025-10-24T17:18:33.365304Z", "status": { "status_code": "UNSET" }, "attributes": { "sk.available_functions": "ShoppingMCP-get_other_buying_options,ShoppingMCP-get_price_history,ShoppingMCP-get_product_group_search_results,ShoppingMCP-query_based_entity_linker" }, "events": [], "links": [], "resource": { "attributes": { "telemetry.sdk.language": "python", "telemetry.sdk.name": "opentelemetry", "telemetry.sdk.version": "1.38.0", "service.name": "telemetry-console-quickstart" }, "schema_url": "" } }

{ "body": "Function completed. Duration: 0.511368s", "severity_number": 9, "severity_text": "INFO", "attributes": { "otelSpanID": "424de03315a34ad3", "otelTraceID": "2c1094153c3994302defbe4821436a7f", "otelTraceSampled": true, "otelServiceName": "telemetry-console-quickstart", "code.file.path": "c:\agenticshopping\.venv\Lib\site-packages\semantic_kernel\functions\kernel_function_log_messages.py", "code.function.name": "log_function_completed", "code.line.number": 53 }, "dropped_attributes": 0, "timestamp": "2025-10-24T17:18:19.147725Z", "observed_timestamp": "2025-10-24T17:18:19.148096Z", "trace_id": "0x2c1094153c3994302defbe4821436a7f", "span_id": "0x424de03315a34ad3", "trace_flags": 1, "resource": { "attributes": { "telemetry.sdk.language": "python", "telemetry.sdk.name": "opentelemetry", "telemetry.sdk.version": "1.38.0", "service.name": "telemetry-console-quickstart" }, "schema_url": "" }, "event_name": "" }

xiaoxuqi-ms avatar Oct 24 '25 17:10 xiaoxuqi-ms

The kernel function completes in 0.5 seconds but AutoFunctionInvocationLoop takes almost 21 seconds. Is there any ways to know which span takes so much time? Or is there way to reduce the time in AutoFunctionInvocationLoop? { "name": "AutoFunctionInvocationLoop", "context": { "trace_id": "0x2c1094153c3994302defbe4821436a7f", "span_id": "0x0221d7d52fb74bd5", "trace_state": "[]" }, "kind": "SpanKind.INTERNAL", "parent_id": "0x3cd982409b8f92a7", "start_time": "2025-10-24T17:18:12.830324Z", "end_time": "2025-10-24T17:18:33.365304Z", "status": { "status_code": "UNSET" }, "attributes": { "sk.available_functions": "ShoppingMCP-get_other_buying_options,ShoppingMCP-get_price_history,ShoppingMCP-get_product_group_search_results,ShoppingMCP-query_based_entity_linker" }, "events": [], "links": [], "resource": { "attributes": { "telemetry.sdk.language": "python", "telemetry.sdk.name": "opentelemetry", "telemetry.sdk.version": "1.38.0", "service.name": "telemetry-console-quickstart" }, "schema_url": "" } }

{ "body": "Function completed. Duration: 0.511368s", "severity_number": 9, "severity_text": "INFO", "attributes": { "otelSpanID": "424de03315a34ad3", "otelTraceID": "2c1094153c3994302defbe4821436a7f", "otelTraceSampled": true, "otelServiceName": "telemetry-console-quickstart", "code.file.path": "c:\agenticshopping.venv\Lib\site-packages\semantic_kernel\functions\kernel_function_log_messages.py", "code.function.name": "log_function_completed", "code.line.number": 53 }, "dropped_attributes": 0, "timestamp": "2025-10-24T17:18:19.147725Z", "observed_timestamp": "2025-10-24T17:18:19.148096Z", "trace_id": "0x2c1094153c3994302defbe4821436a7f", "span_id": "0x424de03315a34ad3", "trace_flags": 1, "resource": { "attributes": { "telemetry.sdk.language": "python", "telemetry.sdk.name": "opentelemetry", "telemetry.sdk.version": "1.38.0", "service.name": "telemetry-console-quickstart" }, "schema_url": "" }, "event_name": "" }

Hi @xiaoxuqi-ms,

As I mentioned in my previous comment, your trace also didn't seem to have captured the model invocation spans, which would take a most time because they involve round trips to the model. could you double check on why your app isn't capturing those? Do you have SEMANTICKERNEL_EXPERIMENTAL_GENAI_ENABLE_OTEL_DIAGNOSTICS_SENSITIVE set? Please refer to: https://learn.microsoft.com/en-us/semantic-kernel/concepts/enterprise-readiness/observability/telemetry-with-console?tabs=Powershell-CreateFile%2CEnvironmentFile&pivots=programming-language-python#environment-variables

TaoChenOSU avatar Oct 24 '25 18:10 TaoChenOSU

Yes, I added this to my .env SEMANTICKERNEL_EXPERIMENTAL_GENAI_ENABLE_OTEL_DIAGNOSTICS_SENSITIVE=true And I added all these functions: set_up_logging() set_up_tracing() set_up_metrics() Before main()

xiaoxuqi-ms avatar Oct 24 '25 18:10 xiaoxuqi-ms