shelltestrunner
shelltestrunner copied to clipboard
multiline matching of stdout/stderr doesn't work as expected (cannot achieve it)
When testing this, it matches:
# test multiline matches
$ echo -e "Line 1 blabla\nLine 2 haha\nLine 3 hihihi"
> /.*Line 1.*/
>= 0
but when testing this, it doesn't (returns failure):
# test multiline matches
$ echo -e "Line 1 blabla\nLine 2 haha\nLine 3 hihihi"
> /.*Line 1.*Line 2.*/
>= 0
the regex.TDFA matcher is supposed to default to multiline, so I'd expect that one to work. But even if I explicitly try to include newlines for the catch-all, it doesn't work as well (also returns failure):
# test multiline matches
$ echo -e "Line 1 blabla\nLine 2 haha\nLine 3 hihihi"
> /.*Line 1(.|\n)*Line 2.*/
>= 0
Is there a syntax I can use to achieve multiline matching as-is, or does it require a mod to the code?
Sorry I'm not sure - needs debugging. Perhaps we are not calling it in multiline mode.
It is called in multiline mode. But regex-tdfa has a non-standard multiline mode that combines what is usually known as "multiline" with inverse "dotall" and also disables matching newlines in inverted character classes (so you can't even use e.g. [^!]).
You can match a newline using "(.|\n)", but only with an actual newline in the pattern (since \n is just n to regex-tdfa). I don't think that shelltestrunner can currenly do that for you.
Also: fyi echo -e doesn't usually work with /bin/sh.
regex-tdfa does recognise [[:space:]] though, so this works:
# test multiline matches
$ printf "Line 1 blabla\nLine 2 haha\nLine 3 hihihi\n"
> /.*Line 1(.|[[:space:]])*Line 2.*/
>= 0
Also: fyi
echo -edoesn't usually work with/bin/sh.
# with bash
$ echo "foo\nbar"
foo\nbar
$ echo -e "foo\nbar"
foo
bar
$ printf "foo\nbar\n"
foo
bar
# with /bin/sh (on my system)
$ echo "foo\nbar"
foo
bar
$ echo -e "foo\nbar"
-e foo
bar
$ printf "foo\nbar\n"
foo
bar
The behaviour of echo regarding escapes and options differs greatly between systems.
I recommend using printf instead (though you need to manually add a \n at the end).
@obfusk Thanks for the infos, very useful.
Indeed on my sh echo -e works as expected, but since in some cases I need compatibility with e.g. busybox etc, it's still a valuable comment which I will use.
As for the [[:space:]] workaround, very useful! I guess this greatly lessens at least the urgency of this issue.
What I didn't 100% understand is whether we should abandon the expectation to handle newlines in a standard way completely due to inherent limitations of regex-tdfa, or whether it would be possible to configure it in such a way that the behaviour is possible?
What I didn't 100% understand is whether we should abandon the expectation to handle newlines in a standard way completely due to inherent limitations of
regex-tdfa, or whether it would be possible to configure it in such a way that the behaviour is possible?
You could (add an option to shelltestrunner to) turn multiline mode off; this allows you to match newlines with ., but no longer allows you to match the start/end of a line with ^/$ (they only match at the start/end of the whole string).
It's unfortunate that regex-tdfa has chosen such non-standard behaviour: merging "multiline" and "dotall" into one option + not matching newlines in complementing character classes (which AFAIK no other regex implementation does). Thus (optionally, if you want backwards compatibility) using a different regex implementation might be preferable.
Another option would be to (have an option to) "preprocess" the regex and replace . with (.|[[:space:]]) (though this is non-trivial); e.g. using a syntax like /.../s (similar to e.g. Perl and JavaScript).
Worth raising in regex-tdfa's issue tracker maybe ?
Worth raising in regex-tdfa's issuentrwcker maybr ?
https://github.com/haskell-hvr/regex-tdfa/issues/11
You could (add an option to shelltestrunner to) turn multiline mode off; this allows you to match newlines with
., but no longer allows you to match the start/end of a line with^/$(they only match at the start/end of the whole string).
@simonmichael
This might actually be a nice option to have as a command line option to shelltest which is probably easy to implement?
Then one could simply choose the behaviour based on the use case.
Multi-line off would be perfectly suitable for cli-testing where e.g. a program feedback is checked (e.g. the contents of a help or error message), since in many cases the keywords/patterns will be more important than the lines they're on.
@ppenguin fwiw I recently quickly hacked together a Python implementation of something similar to shelltest. It's unfinished, not entirely compatible, only implements part of the functionality, ~~hasn't been documented yet~~, and probably has some bugs. But it does support proper multiline tests (and uses Python's more extensive regex capabilities):
# test multiline matches
$ printf "Line 1 foo\nLine 2 bar\nLine 3 baz\n"
> /^line 1.*^line 2/ims
Note the /.../ims to enable case insensitive matching (i), multiline (m) & dotall (s).
I would very much like this. I am currently porting my application from python to haskell and I dearly miss the integrated test generation ("transcript") of https://github.com/python-cmd2/cmd2 where you can save the output of the application in order to test it at a later date. I've just written one test with shelltestrunner but due to the size of the output it would be too impractical to maintain those transcripts manually. I mention this because it can be an inspiration for line handling too.
NB: I also find the expected output/command/get output quite hard to notice.
Would anybody like to propose/work on some improvements ?
Couldn't this problem (easy multiline matching) be solved by allowing multiple regexes per file descriptor? At least assuming that order of lines is not important.
I.e. I'm thinking of
printf "Line 1 foo\nLine 2 bar\nLine 3 baz\n"
>>> /Line 1/
>>> /Line 2 bar$/
>>>= 0
And one would need to resort to proper multiline matching only if specific order is needed.
Some years ago, regex-tdfa was the best compromise of power and portability. Is there anything better (more standard, more robust) nowadays ?