autogen autogen-ext test is too slow...!

What happened?

Describe the bug Check each tests at autogen-ext

autogen-ext/cache_store : 0.15s autogen-ext/code_executors : 203.89s autogen-ext/memory : 12s autogen-ext/models : 37s autogen-ext/tools: 14s autogen_ext/*.py : 146.02s

I could find out slow tests

packages/autogen-ext/tests/code_executors/test_docker_commandline_code_executor.py : 161.66s packages/autogen-ext/tests/test_openai_assistant_agent.py : 92.66s

=============================================================== slowest more than 1sec durations ================================================================
21.02s call     tests/code_executors/test_docker_commandline_code_executor.py::test_delete_tmp_files
16.01s call     tests/test_openai_assistant_agent.py::test_file_retrieval[openai]
13.08s call     tests/test_openai_assistant_agent.py::test_on_reset_behavior[openai]
11.42s call     tests/code_executors/test_docker_commandline_code_executor.py::test_commandline_code_executor_timeout[docker]
10.75s call     tests/code_executors/test_docker_commandline_code_executor.py::test_deprecated_warning
10.69s call     tests/code_executors/test_docker_commandline_code_executor.py::test_docker_commandline_code_executor_serialization
10.50s call     tests/code_executors/test_docker_commandline_code_executor.py::test_docker_commandline_code_executor_extra_args
10.49s call     tests/code_executors/test_docker_commandline_code_executor.py::test_error_wrong_path
10.38s call     tests/code_executors/test_docker_commandline_code_executor.py::test_docker_commandline_code_executor_start_stop
10.34s call     tests/code_executors/test_docker_commandline_code_executor.py::test_docker_commandline_code_executor_start_stop_context_manager
10.34s call     tests/code_executors/test_docker_commandline_code_executor.py::test_directory_creation_cleanup
10.25s call     tests/code_executors/test_docker_jupyter_code_executor.py::test_canncellation[docker]
10.18s teardown tests/code_executors/test_docker_commandline_code_executor.py::test_commandline_code_executor_cancellation[docker]
10.12s teardown tests/code_executors/test_docker_commandline_code_executor.py::test_commandline_code_executor_timeout[docker]
10.12s teardown tests/code_executors/test_docker_commandline_code_executor.py::test_valid_relative_path[docker]
10.10s teardown tests/code_executors/test_docker_commandline_code_executor.py::test_execute_code[docker]
10.10s teardown tests/code_executors/test_docker_commandline_code_executor.py::test_invalid_relative_path[docker]
10.03s call     tests/test_worker_runtime.py::test_register_receives_publish_cascade_single_worker
7.51s call     tests/test_websurfer_agent.py::test_run_websurfer
6.44s call     tests/test_openai_assistant_agent.py::test_code_interpreter[openai]
6.20s call     tests/models/test_llama_cpp_model_client.py::test_llama_cpp_integration_non_streaming_structured_output
5.87s call     tests/models/test_llama_cpp_model_client.py::test_llama_cpp_integration_non_streaming
3.96s call     tests/models/test_openai_model_client.py::test_model_client_with_function_calling[gpt-4.1-nano]
3.71s call     tests/models/test_openai_model_client.py::test_model_client_basic_completion[gpt-4.1-nano]
3.10s call     tests/memory/test_chroma_memory.py::test_initialization
2.87s call     tests/models/test_openai_model_client.py::test_structured_output_with_streaming_tool_calls
2.81s call     tests/code_executors/test_docker_jupyter_code_executor.py::test_timeout[docker]
2.75s call     tests/models/test_openai_model_client.py::test_structured_output_with_streaming
2.72s call     tests/code_executors/test_jupyter_code_executor.py::test_commandline_code_executor_timeout
2.72s call     tests/code_executors/test_docker_jupyter_code_executor.py::test_execute_code_with_image_output
2.39s call     tests/memory/test_chroma_memory.py::test_content_types
2.01s call     tests/code_executors/test_docker_commandline_code_executor.py::test_commandline_code_executor_cancellation[docker]
1.95s call     tests/tools/test_mcp_tools.py::test_mcp_server_fetch
1.92s setup    tests/code_executors/test_docker_jupyter_code_executor.py::test_execute_code[docker]
1.87s call     tests/code_executors/test_docker_jupyter_code_executor.py::test_execute_code_and_persist_variable[docker]
1.82s call     tests/memory/test_chroma_memory.py::test_strict_matching
1.72s call     tests/code_executors/test_jupyter_code_executor.py::test_commandline_code_executor_cancellation
1.71s call     tests/test_playwright_controller.py::test_playwright_controller_click_id
1.68s call     tests/models/test_openai_model_client.py::test_openai_structured_output_with_streaming_tool_calls[gpt-4.1-nano]
1.64s call     tests/models/test_openai_model_client.py::test_openai_structured_output_with_tool_calls[gpt-4.1-nano]
1.39s call     tests/memory/test_chroma_memory.py::test_basic_workflow
1.34s call     tests/code_executors/test_jupyter_code_executor.py::test_jupyter_code_executor_serialization
1.25s call     tests/memory/test_chroma_memory.py::test_model_context_update
1.22s call     tests/memory/test_chroma_memory.py::test_metadata_handling
1.21s call     tests/code_executors/test_jupyter_code_executor.py::test_execute_code_after_restart
1.13s call     tests/code_executors/test_docker_jupyter_code_executor.py::test_start_stop
1.09s call     tests/tools/test_mcp_tools.py::test_mcp_server_filesystem
1.09s call     tests/models/test_openai_model_client.py::test_openai_structured_output[gpt-4.1-nano]
1.04s call     tests/models/test_openai_model_client.py::test_openai_structured_output_with_streaming[gpt-4.1-nano]
1.03s call     tests/models/test_openai_model_client.py::test_openai_structured_output_using_response_format[gpt-4.1-nano]
1.01s call     tests/code_executors/test_commandline_code_executor.py::test_commandline_code_executor_timeout[local]
1.01s call     tests/code_executors/test_commandline_code_executor.py::test_commandline_code_executor_cancellation
1.00s setup    tests/code_executors/test_docker_jupyter_code_executor.py::test_canncellation[docker]
1.00s setup    tests/code_executors/test_docker_jupyter_code_executor.py::test_execute_code_and_persist_variable[docker]

To Reproduce poe test pytest python/packages/autogen-ext/tests

Expected behavior More fast...!

Which packages was the bug in?

Python Extensions (autogen-ext)

AutoGen library version.

Python dev (main branch)

Other library version.

No response

Model used

No response

Model provider

None

Other model provider

No response

Python version

None

.NET version

None

Operating system

None

Apr 23 '25 18:04 SongChiYoung

Let's first figure out why the docker tests are so slow.

Then, for docker code executor tests (both DockerCommandLineCodeExecutor and DockerJupyterCodeExecutor), I think we should create separate poe tasks to to run them, and have separate jobs in .github/workflows/checks.yml. See example test-grpc which is already separate:

https://github.com/microsoft/autogen/blob/b6935f913b7f92201519cec37f18b6de6f824144/.github/workflows/checks.yml#L152-L153

Apr 23 '25 18:04 ekzhu

@ekzhu Cool. I found that in my environment, each Docker build takes about 10 seconds.

In packages/autogen-ext/tests/code_executors/test_docker_commandline_code_executor.py,
there are 13 calls to DockerCommandLineCodeExecutor and 5 calls to the executor_and_temp_dir fixture.

So, now I see why the Docker tests are quite slow.

I’m testing sharing Docker containers between tests instead of creating a new one for each test.

https://github.com/microsoft/autogen/blob/0c9fd64d6e029007dbfe5689bf076e549c78ef79/python/packages/autogen-ext/tests/code_executors/test_docker_commandline_code_executor.py#L35-L44

This results in Docker being rebuilt for each test.
I changed the scope to "session" like below to reuse the Docker container:

@pytest_asyncio.fixture(scope="session")  # type: ignore
async def executor_and_temp_dir() -> AsyncGenerator[tuple[DockerCommandLineCodeExecutor, str], None]:
    if not docker_tests_enabled():
        pytest.skip("Docker tests are disabled")

    with tempfile.TemporaryDirectory() as temp_dir:
        async with DockerCommandLineCodeExecutor(work_dir=temp_dir) as executor:
            yield executor, temp_dir

At packages/autogen-ext/tests/code_executors/test_docker_commandline_code_executor.py As a result, the test duration improved from 161.66s to 110s.

Yup, it still needs a clean-up routine for the shared work_dir between tests. Currently, this is just a suggestion, so I haven’t implemented it yet.

But I plan to implement it like this:

@pytest_asyncio.fixture(scope="function")
async def cleanup_after_test(executor_and_temp_dir, request):
    _, work_dir = executor_and_temp_dir
    def cleanup():
        reset_temp_dir(work_dir)
    request.addfinalizer(cleanup)
    yield

And change test usage like this:

@pytest.mark.asyncio
async def test_example(executor_and_temp_dir, cleanup_after_test):
    executor, tmp_dir = executor_and_temp_dir
    ....
    .... (Yes, It routine is same as before)

Just sharing my ongoing experiment :)

I welcome any suggestions or help from others.

Apr 25 '25 01:04 SongChiYoung