`Pipeline.draw` timeouts
Currently, Pipeline.draw and Pipeline.show call the mermaid.ink server by default.
(Users can also configure a custom Mermaid server using Docker.)
Recent problems
Pipeline.draw has been experiencing frequent timeouts.
Over the past month, Mermaid servers have faced reliability issues, likely due to high traffic.
See the following issues: https://github.com/jihchi/mermaid.ink/issues/491, https://github.com/jihchi/mermaid.ink/issues/498.
We recently introduced changes to pipeline drawing (#8767, #8799), but these do not appear to be the cause of the timeouts.
These failures impact users and our CI pipeline, causing integration tests to fail and slowing down development.
Affected tests
- integration tests in haystack/test/core/pipeline/test_draw.py
- nightly e2e tests (these have not been failing in the last few days)
- tutorials tests
Action taken/in progress
- Configurable timeout in
Pipeline.draw#8967 - Retry mechanism in
Pipeline.draw#9045 (uncertain if this is effective for CI due to repeated calls in a short timeframe.)
Possible next steps
- ~Skip non-critical integration tests that frequently fail~ done in #9108
- ~remove
Pipeline.drawfrom e2e tests if they start to fail again~ done in #9121 - reflect on long-term solutions (hosting our own Mermaid server, find a python visualization library, ...)
In our nightly runs of our tutorial notebooks it could be worth updating our conversion script to skip lines containing pipeline.draw. Since often our nightly runs fail due to mermaid time out errors.
I have experienced this issue consistently too when trying to illustrate a pipeline on the remote server. Imo, I think two major downsides of Mermaid are:
- using the remote server, the pipeline image will be stored and can be retrieved easily by copying the link
- locally is that it requires to use Docker
As a user, I like the ability to switch between remote and local visualization mode with Mermaid, but I would like the option to choose whether to store the image on the remote server or not (sharing vs. data/information protection). Regarding local visualization, I think it would be really helpful to have a tool that allows visualizing pipelines directly, without requiring Docker, on any OS.
Many tools like use graphviz, would that be an option?