haystack icon indicating copy to clipboard operation
haystack copied to clipboard

Mermaid Crashes If trying to draw a large pipeline

Open CarlosFerLo opened this issue 1 year ago • 6 comments

Thanks in advance for your help :)

Describe the bug I was building a huge pipeline, 30 components and 35 connections, and for debugging proposes I wanted to display the diagram, but both .draw() and .show() methods failed. It still works with small pipelines by the way.

Error message

Failed to draw the pipeline: https://mermaid.ink/img/ returned status 400
No pipeline diagram will be saved.
Failed to draw the pipeline: could not connect to https://mermaid.ink/img/ (400 Client Error: Bad Request for url: https://mermaid.ink/img/{place holder for 2km long data}

No pipeline diagram will be saved.
Traceback (most recent call last):
  File "/Users/carlosfernandezloran/Desktop/babyagi-classic-haystack/.venv/lib/python3.10/site-packages/haystack/core/pipeline/draw.py", line 87, in _to_mermaid_image
    resp.raise_for_status()
  File "/Users/carlosfernandezloran/Desktop/babyagi-classic-haystack/.venv/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://mermaid.ink/img/{another placeholder}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/carlosfernandezloran/Desktop/babyagi-classic-haystack/babyagi.py", line 188, in <module>
    pipe.draw(path=Path("pipe"))
  File "/Users/carlosfernandezloran/Desktop/babyagi-classic-haystack/.venv/lib/python3.10/site-packages/haystack/core/pipeline/base.py", line 649, in draw
    image_data = _to_mermaid_image(self.graph)
  File "/Users/carlosfernandezloran/Desktop/babyagi-classic-haystack/.venv/lib/python3.10/site-packages/haystack/core/pipeline/draw.py", line 95, in _to_mermaid_image
    raise PipelineDrawingError(
haystack.core.errors.PipelineDrawingError: There was an issue with https://mermaid.ink/, see the stacktrace for details.

Expected behavior I expect the .show() and .draw() methods to work for all pipelines, no matter the size. This might be a Mermaid problem and not strictly haystacks, but we would need to work to implement a local diagram generator as said in #7896

To Reproduce I will not add all the 200 lines of add_component, connect statements, but you can imagine how it goes.

System:

  • OS: macOS
  • GPU/CPU: M1
  • Haystack version (commit or version number): 2.3.0

CarlosFerLo avatar Jul 25 '24 22:07 CarlosFerLo

hey @CarlosFerLo what do you suspect is the issue here? The payload we send to Mermaid is roughly speaking too long, gets truncated somehow and graph generation fails? Or perhaps something else? I'd love to see what's up here but would love to hear your reasoning about the root cause as well.

vblagoje avatar Sep 04 '24 10:09 vblagoje

hey @vblagoje I believe it is a truncation issue. I am not really experienced with Mermaid, but I believe that is the case.

CarlosFerLo avatar Sep 04 '24 11:09 CarlosFerLo

Yes @CarlosFerLo, I investigated this a bit and apparently get request has a common maximum URL length of around 2,000 to 2,048 characters. Most likely smaller graphs fit into this size and when we make a get request it works up until certain graph size, until it doesn't. All the code for this stuff is in haystack/core/pipeline/draw.py. We need to see how to send perhaps a post request to mermaid and make this work. I'll self-assign this issue unless you want to take it, lmk! 🙏

vblagoje avatar Sep 04 '24 14:09 vblagoje

@vblagoje I am currently involved in a project that takes on a lot of my time, if you do not mind assing your self to this issue.

CarlosFerLo avatar Sep 04 '24 14:09 CarlosFerLo

Our intuition was right @CarlosFerLo This is a know limitation that I now confirmed. Here is the script:

import base64
import requests
import io
from PIL import Image
import matplotlib.pyplot as plt

graph = """
    graph LR;
        comp1["<b>comp1</b><br><small><i>AddFixedValue<br><br>Optional inputs:<ul style='text-align:left;'><li>add (Optional[int])</li></ul></i></small>"]:::component 
        -- "result -> value<br><small><i>int</i></small>" --> comp2["<b>comp2</b><br><small><i>Double</i></small>"]:::component;
        comp2["<b>comp2</b><br><small><i>Double</i></small>"]:::component 
        -- "value -> value<br><small><i>int</i></small>" --> comp3["<b>comp3</b><br><small><i>Square</i><br><br>Outputs:<ul style='text-align:left;'><li>result (int)</li></ul></small>"]:::component;
        comp3["<b>comp3</b><br><small><i>Square</i></small>"]:::component 
        -- "output -> next<br><small><i>int</i></small>" --> comp4["<b>comp4</b><br><small><i>MultiplyByTwo</i></small>"]:::component;
        comp4["<b>comp4</b><br><small><i>MultiplyByTwo</i></small>"]:::component 
        -- "next -> result<br><small><i>int</i></small>" --> comp5["<b>comp5</b><br><small><i>SubtractFixedValue<br><br>Optional inputs:<ul style='text-align:left;'><li>sub (Optional[int])</li></ul></i></small>"]:::component;
        comp5["<b>comp5</b><br><small><i>SubtractFixedValue</i></small>"]:::component 
        -- "result -> value<br><small><i>int</i></small>" --> comp6["<b>comp6</b><br><small><i>Divide</i><br><br>Outputs:<ul style='text-align:left;'><li>result (int)</li></ul></small>"]:::component;
        
        classDef component text-align:center;
    
        %% Repeat pattern with arbitrary connections up to 100 nodes
        comp6["<b>comp6</b><br><small><i>Divide</i></small>"]:::component 
        -- "output -> value<br><small><i>int</i></small>" --> comp7["<b>comp7</b><br><small><i>Modulo</i></small>"]:::component;
        comp7["<b>comp7</b><br><small><i>Modulo</i></small>"]:::component 
        -- "next -> result<br><small><i>int</i></small>" --> comp8["<b>comp8</b><br><small><i>Power</i></small>"]:::component;
        comp8["<b>comp8</b><br><small><i>Power</i></small>"]:::component 
        -- "result -> output<br><small><i>int</i></small>" --> comp9["<b>comp9</b><br><small><i>Absolute</i></small>"]:::component;
        comp9["<b>comp9</b><br><small><i>Absolute</i></small>"]:::component 
        -- "value -> next<br><small><i>int</i></small>" --> comp10["<b>comp10</b><br><small><i>Inverse</i></small>"]:::component;
    
            comp10["<b>comp10</b><br><small><i>Inverse</i></small>"]:::component 
        -- "inverse -> value<br><small><i>int</i></small>" --> comp11["<b>comp11</b><br><small><i>Logarithm</i></small>"]:::component;
        comp11["<b>comp11</b><br><small><i>Logarithm</i></small>"]:::component 
        -- "log -> result<br><small><i>float</i></small>" --> comp12["<b>comp12</b><br><small><i>Exponential</i></small>"]:::component;
        comp12["<b>comp12</b><br><small><i>Exponential</i></small>"]:::component 
        -- "exp -> value<br><small><i>float</i></small>" --> comp13["<b>comp13</b><br><small><i>Cosine</i></small>"]:::component;
        comp13["<b>comp13</b><br><small><i>Cosine</i></small>"]:::component 
        -- "cos -> result<br><small><i>float</i></small>" --> comp14["<b>comp14</b><br><small><i>Sine</i></small>"]:::component;
        comp14["<b>comp14</b><br><small><i>Sine</i></small>"]:::component 
        -- "sin -> next<br><small><i>float</i></small>" --> comp15["<b>comp15</b><br><small><i>Tangent</i></small>"]:::component;
        comp15["<b>comp15</b><br><small><i>Tangent</i></small>"]:::component 
        -- "tan -> result<br><small><i>float</i></small>" --> comp16["<b>comp16</b><br><small><i>ArcSine</i></small>"]:::component;
        comp16["<b>comp16</b><br><small><i>ArcSine</i></small>"]:::component 
        -- "asin -> value<br><small><i>float</i></small>" --> comp17["<b>comp17</b><br><small><i>ArcCosine</i></small>"]:::component;
        comp17["<b>comp17</b><br><small><i>ArcCosine</i></small>"]:::component 
        -- "acos -> result<br><small><i>float</i></small>" --> comp18["<b>comp18</b><br><small><i>ArcTangent</i></small>"]:::component;
        comp18["<b>comp18</b><br><small><i>ArcTangent</i></small>"]:::component 
        -- "atan -> next<br><small><i>float</i></small>" --> comp19["<b>comp19</b><br><small><i>SquareRoot</i></small>"]:::component;
        comp19["<b>comp19</b><br><small><i>SquareRoot</i></small>"]:::component 
        -- "sqrt -> result<br><small><i>float</i></small>" --> comp20["<b>comp20</b><br><small><i>CubeRoot</i></small>"]:::component;
         
            comp20["<b>comp20</b><br><small><i>CubeRoot</i></small>"]:::component 
        -- "cbrt -> value<br><small><i>float</i></small>" --> comp21["<b>comp21</b><br><small><i>Factorial</i></small>"]:::component;
        comp21["<b>comp21</b><br><small><i>Factorial</i></small>"]:::component 
        -- "fact -> result<br><small><i>int</i></small>" --> comp22["<b>comp22</b><br><small><i>Permutation</i></small>"]:::component;
        comp22["<b>comp22</b><br><small><i>Permutation</i></small>"]:::component 
        -- "perm -> value<br><small><i>int</i></small>" --> comp23["<b>comp23</b><br><small><i>Combination</i></small>"]:::component;
        comp23["<b>comp23</b><br><small><i>Combination</i></small>"]:::component 
        -- "comb -> result<br><small><i>int</i></small>" --> comp24["<b>comp24</b><br><small><i>GCD</i></small>"]:::component;
        comp24["<b>comp24</b><br><small><i>GCD</i></small>"]:::component 
        -- "gcd -> value<br><small><i>int</i></small>" --> comp25["<b>comp25</b><br><small><i>LCM</i></small>"]:::component;
        comp25["<b>comp25</b><br><small><i>LCM</i></small>"]:::component 
        -- "lcm -> result<br><small><i>int</i></small>" --> comp26["<b>comp26</b><br><small><i>PrimeCheck</i></small>"]:::component;
        comp26["<b>comp26</b><br><small><i>PrimeCheck</i></small>"]:::component 
        -- "prime -> value<br><small><i>boolean</i></small>" --> comp27["<b>comp27</b><br><small><i>Fibonacci</i></small>"]:::component;
        comp27["<b>comp27</b><br><small><i>Fibonacci</i></small>"]:::component 
        -- "fib -> result<br><small><i>int</i></small>" --> comp28["<b>comp28</b><br><small><i>Lucas</i></small>"]:::component;
        comp28["<b>comp28</b><br><small><i>Lucas</i></small>"]:::component 
        -- "lucas -> next<br><small><i>int</i></small>" --> comp29["<b>comp29</b><br><small><i>PascalTriangle</i></small>"]:::component;
        comp29["<b>comp29</b><br><small><i>PascalTriangle</i></small>"]:::component 
        -- "pascal -> result<br><small><i>array</i></small>" --> comp30["<b>comp30</b><br><small><i>BinomialCoefficient</i></small>"]:::component;
     
            comp30["<b>comp30</b><br><small><i>BinomialCoefficient</i></small>"]:::component 
        -- "binom -> value<br><small><i>int</i></small>" --> comp31["<b>comp31</b><br><small><i>QuadraticRoot</i></small>"]:::component;
        comp31["<b>comp31</b><br><small><i>QuadraticRoot</i></small>"]:::component 
        -- "root -> result<br><small><i>float</i></small>" --> comp32["<b>comp32</b><br><small><i>LinearEquation</i></small>"]:::component;
        comp32["<b>comp32</b><br><small><i>LinearEquation</i></small>"]:::component 
        -- "linear -> value<br><small><i>float</i></small>" --> comp33["<b>comp33</b><br><small><i>Polynomial</i></small>"]:::component;
        comp33["<b>comp33</b><br><small><i>Polynomial</i></small>"]:::component 
        -- "poly -> result<br><small><i>float</i></small>" --> comp34["<b>comp34</b><br><small><i>Differential</i></small>"]:::component;
        comp34["<b>comp34</b><br><small><i>Differential</i></small>"]:::component 
        -- "diff -> value<br><small><i>float</i></small>" --> comp35["<b>comp35</b><br><small><i>Integral</i></small>"]:::component;
        comp35["<b>comp35</b><br><small><i>Integral</i></small>"]:::component 
        -- "integral -> result<br><small><i>float</i></small>" --> comp36["<b>comp36</b><br><small><i>FourierTransform</i></small>"]:::component;
        comp36["<b>comp36</b><br><small><i>FourierTransform</i></small>"]:::component 
        -- "fourier -> value<br><small><i>complex</i></small>" --> comp37["<b>comp37</b><br><small><i>LaplaceTransform</i></small>"]:::component;
        comp37["<b>comp37</b><br><small><i>LaplaceTransform</i></small>"]:::component 
        -- "laplace -> result<br><small><i>complex</i></small>" --> comp38["<b>comp38</b><br><small><i>MatrixMultiplication</i></small>"]:::component;
        comp38["<b>comp38</b><br><small><i>MatrixMultiplication</i></small>"]:::component 
        -- "matrix -> value<br><small><i>array</i></small>" --> comp39["<b>comp39</b><br><small><i>VectorAddition</i></small>"]:::component;
        comp39["<b>comp39</b><br><small><i>VectorAddition</i></small>"]:::component 
        -- "vector -> result<br><small><i>array</i></small>" --> comp40["<b>comp40</b><br><small><i>DotProduct</i></small>"]:::component;
    """
breaking_chunk = """
          comp40["<b>comp40</b><br><small><i>DotProduct</i></small>"]:::component 
        -- "dot -> value<br><small><i>float</i></small>" --> comp41["<b>comp41</b><br><small><i>CrossProduct</i></small>"]:::component;
        comp41["<b>comp41</b><br><small><i>CrossProduct</i></small>"]:::component 
        -- "cross -> result<br><small><i>array</i></small>" --> comp42["<b>comp42</b><br><small><i>EigenValue</i></small>"]:::component;
        comp42["<b>comp42</b><br><small><i>EigenValue</i></small>"]:::component 
        -- "eigen -> value<br><small><i>float</i></small>" --> comp43["<b>comp43</b><br><small><i>EigenVector</i></small>"]:::component;
        comp43["<b>comp43</b><br><small><i>EigenVector</i></small>"]:::component 
        -- "vector -> result<br><small><i>array</i></small>" --> comp44["<b>comp44</b><br><small><i>SingularValueDecomposition</i></small>"]:::component;
        comp44["<b>comp44</b><br><small><i>SingularValueDecomposition</i></small>"]:::component 
        -- "svd -> value<br><small><i>matrix</i></small>" --> comp45["<b>comp45</b><br><small><i>CholeskyDecomposition</i></small>"]:::component;
        comp45["<b>comp45</b><br><small><i>CholeskyDecomposition</i></small>"]:::component 
        -- "cholesky -> result<br><small><i>matrix</i></small>" --> comp46["<b>comp46</b><br><small><i>LUDecomposition</i></small>"]:::component;
        comp46["<b>comp46</b><br><small><i>LUDecomposition</i></small>"]:::component 
        -- "lu -> value<br><small><i>matrix</i></small>" --> comp47["<b>comp47</b><br><small><i>QRDecomposition</i></small>"]:::component;
        comp47["<b>comp47</b><br><small><i>QRDecomposition</i></small>"]:::component 
        -- "qr -> result<br><small><i>matrix</i></small>" --> comp48["<b>comp48</b><br><small><i>GramSchmidtProcess</i></small>"]:::component;
        comp48["<b>comp48</b><br><small><i>GramSchmidtProcess</i></small>"]:::component 
        -- "gram -> value<br><small><i>matrix</i></small>" --> comp49["<b>comp49</b><br><small><i>MoorePenroseInverse</i></small>"]:::component;
        comp49["<b>comp49</b><br><small><i>MoorePenroseInverse</i></small>"]:::component 
        -- "inverse -> result<br><small><i>matrix</i></small>" --> comp50["<b>comp50</b><br><small><i>MatrixDeterminant</i></small>"]:::component;
    """

# Encode the graph to Base64
graphbytes = graph.encode("ascii")
base64_bytes = base64.b64encode(graphbytes)
base64_string = base64_bytes.decode("ascii")

print(f"Encoded string: {base64_string}")
print(f"Length chars: {len(base64_string)}")

# Fetch
response = requests.get('https://mermaid.ink/img/' + base64_string)
print(response.headers)

# Display
img = Image.open(io.BytesIO(response.content))
plt.imshow(img)
plt.show()

If you run this script you'll get an image for this arbitrary chatgpt generated graph. However, if you connect the breaking_chunk then we get an exception - failure. I've inspected response headers and the server runs on cloudfare. Not sure what type of the server it is. Cloudflare is most likely running its own custom server software, rather than a standard off-the-shelf web server like Apache or Nginx. But the limit of encoded URL is there - around 12000 chars.

So how do we mitigate this?

  1. We (Haystack) run our own mermaid server with custom URL size set. Possible, but not likely.
  2. Optionally limit the labels on graphs. These labels contribute quite a lot to encoded graph size and thus cause issues for large graph renderings. We had this before but now looking at the code these optional setting seem to be gone. I'll consult with @silvanocerza about this and we'll decide what to do next.

Perhaps there are some other options? I'll talk internally about this and we'll come up with some game plan. Thanks for raising this @CarlosFerLo 🙏

vblagoje avatar Sep 05 '24 09:09 vblagoje

cc @julian-risch moving this one to backlog again as a known limitation. Will consult on possible mitigation routes internally.

vblagoje avatar Sep 05 '24 09:09 vblagoje

Just ran across the issue as well. Should have probably Googled the error message sooner because I spent quite a while trying to draw the pipeline on a whiteboard to see if the issue was with how I designed my pipeline. Since there doesn't seem to be a workaround, I'll probably have to do some hack to make it work with mermaid-cli or something.

lbux avatar Nov 06 '24 23:11 lbux

It's ugly but if someone wants to mess with npm inside their project, here is how I did it:

add an offline function to draw.py

import subprocess

def _to_mermaid_image_offline(graph: networkx.MultiDiGraph):
    graph_styled = _to_mermaid_text(graph.copy())
    
    # Save the Mermaid code to a file
    with open("graph.mmd", "w") as f:
        f.write(graph_styled)
    
    # Call mermaid-cli to generate the image
       subprocess.run(
           ["mmdc", "-i", "graph.mmd", "-s", "5", "-o", "graph.png"],
           check=True,
           timeout=10
       )
       with open("graph.png", "rb") as img_file:
           return img_file.read()

replace the _to_mermaid_image call in draw (base.py)

def draw(self, path: Path) -> None:
        """
        Save an image representing this `Pipeline` to `path`.

        :param path:
            The path to save the image to.
        """
        # Before drawing we edit a bit the graph, to avoid modifying the original that is
        # used for running the pipeline we copy it.
        image_data = _to_mermaid_image_offline(self.graph)
        Path(path).write_bytes(image_data)

+1 for a true offline replacement to be integrated into Haystack somehow

lbux avatar Nov 07 '24 01:11 lbux

I just ran into the same issue - is there another way to solve this?

PaulBFB avatar Jan 03 '25 21:01 PaulBFB

I just ran into the same issue - is there another way to solve this?

Do you have code that I can run to replicate your issue? I am testing an alternative fix using mermaid-py and want to test more before making a PR.

lbux avatar Jan 14 '25 01:01 lbux