Fix pydot tests
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 84.78%. Comparing base (0418dab) to head (551d6b5).
:warning: Report is 31 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #2019 +/- ##
==========================================
- Coverage 84.81% 84.78% -0.03%
==========================================
Files 46 46
Lines 8368 8366 -2
Branches 1962 1962
==========================================
- Hits 7097 7093 -4
- Misses 803 806 +3
+ Partials 468 467 -1
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
The test fails for the same reason it failed for me locally: https://github.com/common-workflow-language/cwltool/actions/runs/9964048376/job/27541027817
The reason is that cwltool returns these strings with a path attached to them, while it wasn't expected.
It doesn't seem to be pydot related, does it?
@lkk7 Interesting, I've triggered a re-run of the CI on the main branch to see if there has been a regression related to one of our dependencies since July 12th
https://github.com/common-workflow-language/cwltool/actions/runs/9905754128
@lkk7 When we pin to a pydot before version 3, all the CI tests still pass: https://github.com/common-workflow-language/cwltool/actions/runs/9976419287/job/27568552139
That's an old issue, but could we retry the CI now? Or has this been solved already?
@lkk7 Since you started this PR we capped the pydot version to be less than 3. So now that we've rebased the PR I had to add a revert of that pydot version cap commit. We can now compare the pending test results (with pydot 3.x) to the results without pydot 3.x.
@lkk7 Looks like there is a behavior change with pydot 3.x (3.0.1, specifically)
[
(
- '"command_line_tool"',
- '"file_output"',
+ 'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#command_line_tool"',
+ 'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#file_output"',
),
(
- '"file_input"',
- '"nested_workflow"',
+ 'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#file_input"',
+ 'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#nested_workflow"',
),
(
- '"nested_workflow"',
- '"operation"',
+ 'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#nested_workflow"',
+ 'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#operation"',
),
(
- '"operation"',
- '"command_line_tool"',
+ 'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#operation"',
+ 'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#command_line_tool"',
),
(
- '"operation"',
- '"string_output"',
+ 'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#operation"',
+ 'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#string_output"',
),
]
https://github.com/common-workflow-language/cwltool/actions/runs/10468140236/job/28988377402?pr=2019#step:8:82
I can't pinpoint the exact fix yet, but the bug appears in cwlrdf.py in this function:
def printdot(
wf: Process,
ctx: ContextType,
stdout: IO[str],
) -> None:
cwl_viewer: CWLViewer = CWLViewer(printrdf(wf, ctx, "n3"))
stdout.write(cwl_viewer.dot().replace(f"{wf.metadata['id']}#", "")) ### <<< HERE
It's all about those file paths.
wf.metadata['id'] equals 'file:///<your_path>/three_step_color.cwl', and in pydot 2.0.0 it's removed from the string as expected.
In 3.0.0+ the cwl_viewer.dot() contains these paths in a changed form: file:"///<your_path>/three_step_color.cwl#string_output". There is an additional quote that causes the replacement not to work.
@ferdnyc, Could you confirm my suspicion. I know you were not involved in this one, but you made the change that affect this (ea0131517db2a0dbf58ec0f8bb3bd7b40bbd279d). The context of this bug is not important, but – file:///<path>#<...> changed to file:"///<path>#<...>". Is it because now pydot thinks it's a port, like node1:port1?
ping @lkk7 @@ferdnyc , thank you
After a year, I've looked at this again, and I think again that this is because of the handling of port syntax, (the : sign which is used in the failing test).
The thing happens here in cwltool:
def printdot(
wf: Process,
ctx: ContextType,
stdout: IO[str],
) -> None:
cwl_viewer: CWLViewer = CWLViewer(printrdf(wf, ctx, "n3"))
stdout.write(cwl_viewer.dot().replace(f"{wf.metadata['id']}#", ""))
As we see in the failed test log posted above:
( - '"command_line_tool"', - '"file_output"', + 'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#command_line_tool"', + 'file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#file_output"', ), ...
The problem occurs specifically here:
stdout.write(cwl_viewer.dot().replace(f"{wf.metadata['id']}#", ""))`
The replacement isnt working anymore, because it wants to erase file:///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#command_line_tool, but it encounters file:"///home/runner/work/cwltool/cwltool/tests/wf/three_step_color.cwl#command_line_tool".
I'm not sure if we can just revert back to the previous behavior which ignored it, because theoretically it's unsafe. But it's possible that the quoting behavior is too tight and I'll mention it in the upcoming PR.
An ugly alternative is to adjust the .replace() call, here in cwltool, to just work around the problem.
Apologies, just seeing this for the first time. The port conjecture is likely; the way to overcome that would be to explicitly quote the ID string. (So, pydot.Node('"file:///<path>#command_line_tool"') rather than pydot.Node("file:///<path>#command_line_tool").) A quoted string shouldn't be parsed apart into an id:port value, because the correct quoting there would be '"id":"port"'.
That's consistent with graphviz's own parser, since:
graph G {
# Either of these will break (syntax error),
# since it can't parse slashes in an unquoted port value
file:///some/path#fragment [label="\N"];
file:///some/path [label="\N"];
}
graph G {
# These are all exactly equivalent,
# any will create a node named "file"
file:somepath [label="\N"];
"file":"somepath" [label="\N"];
file:"///some/path#fragment" [label="\N"];
}
graph G {
# This will create a node named "file:///some/path#fragment"
"file:///some/path#fragment" [label="\N"];
}
Without knowing more about the context, I can't say where the behavior changed, but if Pydot 2.x was processing input differently it was probably doing it wrong in ways that would break on valid syntax. Like node IDs with :port components, which even though they're completely superfluous (ports are only meaningful for edges, not nodes) are still valid according to the grammar. A lot of that stuff was cleaned up for 3.x.
@ferdnyc Thanks for the suggestion! Of course we can quote it. Why didn't I think of that?
@mr-c
Could you approve the workflow? The latest commit should make the tests pass.
Disclaimer: This change only satisfies the tests, which aren't 100% exhaustive.
If you want the tested output to match the expected output exactly) (which it doesn't right now, as there are some small differences), then the code needs to be refactored. Maybe the expected graph is not up-to-date or something. But this is outside the scope of the current issue, as it has been this way for a long time.
And generally, it works. But I'm not that experienced in cwltool to check it in detail.
@lkk7
Why didn't I think of that?
Because then I wouldn't get the chance to sound Clever™!
Without knowing more about the context, I can't say where the behavior changed, but if Pydot 2.x was processing input differently it was probably doing it wrong in ways that would break on valid syntax. Like node IDs with
:portcomponents, which even though they're completely superfluous (ports are only meaningful for edges, not nodes) are still valid according to the grammar. A lot of that stuff was cleaned up for 3.x.
Actually, it looks like these are edges, since even Pydot 4.x won't actually output the port part of a Node definition (as I said, it's meaningless); but the difference is in how the Edge endpoints are output (not parsed).
Pydot does a lot of DWIMming. So if the user creates a technically-invalid edge endpoint like file:///some/path#fragment, instead of throwing a syntax error like Graphviz would, we'll just add the necessary quotes for you.
But whereas Pydot 2.x, 3.x, and 4.x would all output a simple set of endpoints like pydot.Edge("a:p1", "b:p2") the same way:
>>> import pydot
>>> pydot.__version__
'2.0.0'
>>> e = pydot.Edge("a:p1", "b:p2")
>>> e.to_string()
'a:p1 -- b:p2;'
The difference is in how they DWIM endpoints that do require quoting.
Pydot 2.x just threw quotes around the entire endpoint string, making it unnecessarily difficult to create endpoints with port values:
>>> import pydot
>>> pydot.__version__
'2.0.0'
>>> e = pydot.Edge("a:p2!", "b:p4!")
>>> e.to_string()
'"a:p2!" -- "b:p4!";'
>>> e = pydot.Edge("a?:p1", "b?:p2")
>>> e.to_string()
'"a?:p1" -- "b?:p2";'
Pydot 3.x+ is smarter, and knows how to properly quote an id:port-format endpoint:
>>> import pydot
>>> pydot.__version__
'4.0.1.dev0'
>>> e = pydot.Edge("a:p2!", "b:p4!")
>>> e.to_string()
'a:"p2!" -- b:"p4!";'
>>> e = pydot.Edge("a?:p2", "b?:p4")
>>> e.to_string()
'"a?":p2 -- "b?":p4;'
...which does mean that now, when you don't want that colon to represent the id:port split, you need to wrap the entire ID in quotes.
@mr-c\n\nCould you approve the workflow? The latest commit should make the tests pass
Done! But I see test errors
(Also, note that they will all — including 4.0.1.dev0 — screw up an id:port:compass-format endpoint unless it's properly quoted going in.) This is a bug:
>>> import pydot
>>> pydot.__version__
'4.0.1.dev0'
>>> e = pydot.Edge("a?:p2:nw", "b?:p4:se")
>>> e.to_string()
'"a?:p2":nw -- "b?:p4":se;'
>>> # The workaround, for now
>>> e = pydot.Edge('"a?":p2:nw', '"b?":p4:se')
>>> e.to_string()
'"a?":p2:nw -- "b?":p4:se;'
Fortunately, that's a relatively obscure feature, mostly used by other code that fully quotes all inputs by default.
@mr-c\n\nCould you approve the workflow? The latest commit should make the tests pass
Done! But I see test errors
cwltool/cwlviewer.py: note: In member "_set_inner_edges" of class "CWLViewer":
cwltool/cwlviewer.py:102:21: error: Module has no attribute
"quote_id_if_necessary" [attr-defined]
pydot.quote_id_if_necessary(str(inner_edge_row["so...
^~~~~~~~~~~~~~~~~~~~~~~~~~~
Mmm, to appease Mypy this should probably use pydot.core.quote_id_if_necessary.
I think it was about the pydot.pyi missing the new quote_id_if_necessary. Added it.
If that doesn't help, then maybe on old pydot version is cached somewhere in the CI process. Or, ferdnyc is right as always.
@mr-c
@lkk7 can you run make cleanup?