alga icon indicating copy to clipboard operation
alga copied to clipboard

escape strings to satisfy Dot syntax rules

Open jmtd opened this issue 7 years ago • 10 comments

GraphViz's "dot" language has several options for encoding IDs, as described here: https://www.graphviz.org/doc/info/lang.html

Alga makes use of the quoted-string syntax. This permits any character within double quotes, except the double-quote character itself, which must be prefixed by a backslash character. (the backslash itself does not need to be escaped).

In order to implement this requirement, introduce an "Escape" typeclass and "escape" function.

Provide an instance definition for regular Haskell Strings. This requires the FlexibleInstances Syntax extension.

jmtd avatar Jan 27 '18 15:01 jmtd

Things I know need doing (at least)

  • [ ] tests
  • [x] GHC warns about redundant constraints for some functions, to investigate
  • [ ] docs
  • [x] do the non-show export methods still work as expected
  • [x] probably a Data.Text instance

jmtd avatar Jan 27 '18 15:01 jmtd

Simple test program

import Algebra.Graph
import Algebra.Graph.Export.Dot
main = do
    putStrLn test
    putStrLn $ show $ test == correct
    putStrLn (export defaultStyleViaShow (path [1, 2]) :: String)

test = (export defaultStyleViaShow (path ["a", "b"]) :: String)
correct = "digraph\n\
\{\n\
\  \"\\\"a\\\"\"\n\
\  \"\\\"b\\\"\"\n\
\  \"\\\"a\\\"\" -> \"\\\"b\\\"\"\n\
\}\n"

jmtd avatar Jan 27 '18 15:01 jmtd

@jmtd Thank you! I just realised that I might have overcomplicated things by suggesting to add the type class, since we can escape after we have the String, just like you do in your implementation.

As for generic export, I don't know if we want to give the user full control and escaping themselves, or shall we provide built-in escaping?

snowleopard avatar Jan 27 '18 15:01 snowleopard

As for generic export, I don't know if we want to give the user full control and escaping themselves, or shall we provide built-in escaping?

I suppose the question is, since Alga.Graph.Dot generates the output for graphviz, do you want to promise it will output something valid, no matter construction the user has thrown at you, and whether that is feasible.

jmtd avatar Jan 27 '18 16:01 jmtd

@jmtd What if the user has already provided escaped strings, e.g. we have \"x\" in the input? Do we want to mess this up by changing it to \\"x\\"? Or are we going to try to be even more clever and leave \" as is? I think this might become a maintenance hazard. It may be better to simply warn the users in the comments that export and exportAsIs do not perform escaping. (And test that they indeed do not.)

Note that in exportViaShow we know for sure that input strings are not escaped, because Show instances are expected to produce valid Haskell. Therefore, we can safely apply your escaping implementation.

snowleopard avatar Jan 27 '18 17:01 snowleopard

I've just pushed a simplified patch that only implements escape for defaultStyleViaShow. I've gone for a simpler escape implementation than earlier which I think performs much better and is debatably simpler to understand too. I still need to write tests and docs.

jmtd avatar Jan 28 '18 16:01 jmtd

@jmtd Thanks! I have a suspicion that this might be too simple and fail to handle this case correctly:

-- A string containing a single character "
test1 :: String
test1 = "\""

Will the current implementation turn it into \"\\"\" after show and escape?

Perhaps, we need to escape the character \ too? Since DOT uses it for escaping, presumably it should be handled with care, producing \"\\\"\", which I guess is the correct output for test1.

We might also add a test specifically for \:

-- A string containing a single character \
test2 :: String
test2 = "\\"

snowleopard avatar Jan 28 '18 19:01 snowleopard

Yes, the current implementation will generate that, and dot will not accept it:

> putStrLn $ exportViaShow $ vertex "\""
digraph
{
  "\"\\"\""
}

Relatedly, a strict reading of the Dot language description would lead one to believe "\" was a valid identifer, but it is not: there's a gap between the description and actual practice. For my graphviz, here, "\\" is accepted, and renders as a single \. I suppose this case could be handled in the existing pattern-match implementation with something inserted between the existing terms like (untested) escape x:[] = if x = '\\' then x:x:[] else x:[]) (I've started contorting myself to avoid writing any more backslashes...)

(all these backslashes! even GH's markdown parser is having trouble :))

My next aim is to write some tests to capture this, within your existing test framework (which I'm interested to learn more about; I've only used HTF for my own things before, but it looks to me that HTF is not in wide use any more)

jmtd avatar Jan 28 '18 21:01 jmtd

My 'test framework' is not something I'm proud of and could definitely be improved (#13). However, for this PR it should be fairly straightforward to add tests somewhere around here:

https://github.com/snowleopard/alga/blob/master/test/Algebra/Graph/Test/Export.hs#L151-L162

I like to keep docs and tests in sync, so that all invariants are documented and are not surprising to users.

snowleopard avatar Jan 28 '18 23:01 snowleopard

By the way, it may be useful to expose the escaping functionality for two reasons:

  • It would be much easier to document and test the escaping functionality in isolation.
  • Users can take advantage of it when creating their custom export Styles.

The cost is that we'll need to maintain it, but it doesn't seem to high.

snowleopard avatar Jan 28 '18 23:01 snowleopard