autogen-ext test is too slow...!
What happened?
Describe the bug Check each tests at autogen-ext
autogen-ext/cache_store : 0.15s autogen-ext/code_executors : 203.89s autogen-ext/memory : 12s autogen-ext/models : 37s autogen-ext/tools: 14s autogen_ext/*.py : 146.02s
I could find out slow tests
packages/autogen-ext/tests/code_executors/test_docker_commandline_code_executor.py : 161.66s packages/autogen-ext/tests/test_openai_assistant_agent.py : 92.66s
=============================================================== slowest more than 1sec durations ================================================================
21.02s call tests/code_executors/test_docker_commandline_code_executor.py::test_delete_tmp_files
16.01s call tests/test_openai_assistant_agent.py::test_file_retrieval[openai]
13.08s call tests/test_openai_assistant_agent.py::test_on_reset_behavior[openai]
11.42s call tests/code_executors/test_docker_commandline_code_executor.py::test_commandline_code_executor_timeout[docker]
10.75s call tests/code_executors/test_docker_commandline_code_executor.py::test_deprecated_warning
10.69s call tests/code_executors/test_docker_commandline_code_executor.py::test_docker_commandline_code_executor_serialization
10.50s call tests/code_executors/test_docker_commandline_code_executor.py::test_docker_commandline_code_executor_extra_args
10.49s call tests/code_executors/test_docker_commandline_code_executor.py::test_error_wrong_path
10.38s call tests/code_executors/test_docker_commandline_code_executor.py::test_docker_commandline_code_executor_start_stop
10.34s call tests/code_executors/test_docker_commandline_code_executor.py::test_docker_commandline_code_executor_start_stop_context_manager
10.34s call tests/code_executors/test_docker_commandline_code_executor.py::test_directory_creation_cleanup
10.25s call tests/code_executors/test_docker_jupyter_code_executor.py::test_canncellation[docker]
10.18s teardown tests/code_executors/test_docker_commandline_code_executor.py::test_commandline_code_executor_cancellation[docker]
10.12s teardown tests/code_executors/test_docker_commandline_code_executor.py::test_commandline_code_executor_timeout[docker]
10.12s teardown tests/code_executors/test_docker_commandline_code_executor.py::test_valid_relative_path[docker]
10.10s teardown tests/code_executors/test_docker_commandline_code_executor.py::test_execute_code[docker]
10.10s teardown tests/code_executors/test_docker_commandline_code_executor.py::test_invalid_relative_path[docker]
10.03s call tests/test_worker_runtime.py::test_register_receives_publish_cascade_single_worker
7.51s call tests/test_websurfer_agent.py::test_run_websurfer
6.44s call tests/test_openai_assistant_agent.py::test_code_interpreter[openai]
6.20s call tests/models/test_llama_cpp_model_client.py::test_llama_cpp_integration_non_streaming_structured_output
5.87s call tests/models/test_llama_cpp_model_client.py::test_llama_cpp_integration_non_streaming
3.96s call tests/models/test_openai_model_client.py::test_model_client_with_function_calling[gpt-4.1-nano]
3.71s call tests/models/test_openai_model_client.py::test_model_client_basic_completion[gpt-4.1-nano]
3.10s call tests/memory/test_chroma_memory.py::test_initialization
2.87s call tests/models/test_openai_model_client.py::test_structured_output_with_streaming_tool_calls
2.81s call tests/code_executors/test_docker_jupyter_code_executor.py::test_timeout[docker]
2.75s call tests/models/test_openai_model_client.py::test_structured_output_with_streaming
2.72s call tests/code_executors/test_jupyter_code_executor.py::test_commandline_code_executor_timeout
2.72s call tests/code_executors/test_docker_jupyter_code_executor.py::test_execute_code_with_image_output
2.39s call tests/memory/test_chroma_memory.py::test_content_types
2.01s call tests/code_executors/test_docker_commandline_code_executor.py::test_commandline_code_executor_cancellation[docker]
1.95s call tests/tools/test_mcp_tools.py::test_mcp_server_fetch
1.92s setup tests/code_executors/test_docker_jupyter_code_executor.py::test_execute_code[docker]
1.87s call tests/code_executors/test_docker_jupyter_code_executor.py::test_execute_code_and_persist_variable[docker]
1.82s call tests/memory/test_chroma_memory.py::test_strict_matching
1.72s call tests/code_executors/test_jupyter_code_executor.py::test_commandline_code_executor_cancellation
1.71s call tests/test_playwright_controller.py::test_playwright_controller_click_id
1.68s call tests/models/test_openai_model_client.py::test_openai_structured_output_with_streaming_tool_calls[gpt-4.1-nano]
1.64s call tests/models/test_openai_model_client.py::test_openai_structured_output_with_tool_calls[gpt-4.1-nano]
1.39s call tests/memory/test_chroma_memory.py::test_basic_workflow
1.34s call tests/code_executors/test_jupyter_code_executor.py::test_jupyter_code_executor_serialization
1.25s call tests/memory/test_chroma_memory.py::test_model_context_update
1.22s call tests/memory/test_chroma_memory.py::test_metadata_handling
1.21s call tests/code_executors/test_jupyter_code_executor.py::test_execute_code_after_restart
1.13s call tests/code_executors/test_docker_jupyter_code_executor.py::test_start_stop
1.09s call tests/tools/test_mcp_tools.py::test_mcp_server_filesystem
1.09s call tests/models/test_openai_model_client.py::test_openai_structured_output[gpt-4.1-nano]
1.04s call tests/models/test_openai_model_client.py::test_openai_structured_output_with_streaming[gpt-4.1-nano]
1.03s call tests/models/test_openai_model_client.py::test_openai_structured_output_using_response_format[gpt-4.1-nano]
1.01s call tests/code_executors/test_commandline_code_executor.py::test_commandline_code_executor_timeout[local]
1.01s call tests/code_executors/test_commandline_code_executor.py::test_commandline_code_executor_cancellation
1.00s setup tests/code_executors/test_docker_jupyter_code_executor.py::test_canncellation[docker]
1.00s setup tests/code_executors/test_docker_jupyter_code_executor.py::test_execute_code_and_persist_variable[docker]
To Reproduce
poe test
pytest python/packages/autogen-ext/tests
Expected behavior More fast...!
Which packages was the bug in?
Python Extensions (autogen-ext)
AutoGen library version.
Python dev (main branch)
Other library version.
No response
Model used
No response
Model provider
None
Other model provider
No response
Python version
None
.NET version
None
Operating system
None
Let's first figure out why the docker tests are so slow.
Then, for docker code executor tests (both DockerCommandLineCodeExecutor and DockerJupyterCodeExecutor), I think we should create separate poe tasks to to run them, and have separate jobs in .github/workflows/checks.yml. See example test-grpc which is already separate:
https://github.com/microsoft/autogen/blob/b6935f913b7f92201519cec37f18b6de6f824144/.github/workflows/checks.yml#L152-L153
@ekzhu Cool. I found that in my environment, each Docker build takes about 10 seconds.
In packages/autogen-ext/tests/code_executors/test_docker_commandline_code_executor.py,
there are 13 calls to DockerCommandLineCodeExecutor and 5 calls to the executor_and_temp_dir fixture.
So, now I see why the Docker tests are quite slow.
I’m testing sharing Docker containers between tests instead of creating a new one for each test.
https://github.com/microsoft/autogen/blob/0c9fd64d6e029007dbfe5689bf076e549c78ef79/python/packages/autogen-ext/tests/code_executors/test_docker_commandline_code_executor.py#L35-L44
This results in Docker being rebuilt for each test.
I changed the scope to "session" like below to reuse the Docker container:
@pytest_asyncio.fixture(scope="session") # type: ignore
async def executor_and_temp_dir() -> AsyncGenerator[tuple[DockerCommandLineCodeExecutor, str], None]:
if not docker_tests_enabled():
pytest.skip("Docker tests are disabled")
with tempfile.TemporaryDirectory() as temp_dir:
async with DockerCommandLineCodeExecutor(work_dir=temp_dir) as executor:
yield executor, temp_dir
At packages/autogen-ext/tests/code_executors/test_docker_commandline_code_executor.py
As a result, the test duration improved from 161.66s to 110s.
Yup, it still needs a clean-up routine for the shared work_dir between tests. Currently, this is just a suggestion, so I haven’t implemented it yet.
But I plan to implement it like this:
@pytest_asyncio.fixture(scope="function")
async def cleanup_after_test(executor_and_temp_dir, request):
_, work_dir = executor_and_temp_dir
def cleanup():
reset_temp_dir(work_dir)
request.addfinalizer(cleanup)
yield
And change test usage like this:
@pytest.mark.asyncio
async def test_example(executor_and_temp_dir, cleanup_after_test):
executor, tmp_dir = executor_and_temp_dir
....
.... (Yes, It routine is same as before)
Just sharing my ongoing experiment :)
I welcome any suggestions or help from others.