Literal and folded strings cannot handle "\n \n" at the end of a string
Adding the test case 'whitespace between line breaks at the end\n \n' to test/string_test.dart causes tests to fail.
diff --git a/test/string_test.dart b/test/string_test.dart
index b92c20a..a18f57a 100644
--- a/test/string_test.dart
+++ b/test/string_test.dart
@@ -10,6 +10,7 @@ final _testStrings = [
"this is a fairly' long string with\nline breaks",
'whitespace\n after line breaks',
'whitespace\n \nbetween line breaks',
+ 'whitespace between line breaks at the end\n \n',
'\n line break at the start',
'word',
'foo bar',
Tests failures:
$ dart test
00:01 +400 -1: loading test/golden_test.dart
Successfully tested 20 inputs against golden files, created 0 golden files
00:01 +400 -1: test/string_test.dart: Root FOLDED string (4) [E]
Assertion failed: (package:yaml_edit) Modification did not result in expected result.
# YAML before edit:
>
# YAML after edit:
> >-
> whitespace between line breaks at the end
>
>
Please file an issue at:
https://github.com/dart-lang/yaml_edit/issues/new?labels=bug
package:yaml_edit/src/editor.dart 579:7 YamlEditor._performEdit
package:yaml_edit/src/editor.dart 249:14 YamlEditor.update
test/string_test.dart 39:20 main.<fn>
To run this test again: dart test test/string_test.dart -p vm --plain-name 'Root FOLDED string (4)'
00:01 +415 -2: test/string_test.dart: Root LITERAL string (4) [E]
Assertion failed: (package:yaml_edit) Modification did not result in expected result.
# YAML before edit:
>
# YAML after edit:
> |-
> whitespace between line breaks at the end
>
>
Please file an issue at:
https://github.com/dart-lang/yaml_edit/issues/new?labels=bug
package:yaml_edit/src/editor.dart 579:7 YamlEditor._performEdit
package:yaml_edit/src/editor.dart 249:14 YamlEditor.update
test/string_test.dart 39:20 main.<fn>
To run this test again: dart test test/string_test.dart -p vm --plain-name 'Root LITERAL string (4)'
00:10 +624 -2: Some tests failed.
Consider enabling the flag chain-stack-traces to receive more detailed exceptions.
For example, 'dart test --chain-stack-traces'.
@kekavc24 I'm not sure this is a new bug caused by https://github.com/dart-lang/yaml_edit/pull/87 but it's something we should fix.
At a minimum we should fallback to double quoted string encoded, if nothing else.
Indeed, I'd suggest the following steps:
- PR that makes
_tryYamlEncodeLiteraland_tryYamlEncodeFoldedfor strings that end in\n \nand similar.- Figure out what strings aren't allowed at the end.
- Maybe, just start with a safe thing, like
if (string.trimRight() != string) return null - Add additional test cases that exercise all sorts of ways whitespace and linebreaks can be added at the end.
- Explore in what scenarios we can encode strings ending with whitespace / line breaks as:
- Literal strings
- Folded strings
- PRs and tests for literal and folded strings respectively.
I think (1) is the most important. if (string.trimRight() != string) return null is a big hammer, and maybe we can do better, but it's a good start, and it's better to safe than wrong :rofl:
Btw, I can be found on dart_community discord, I'm jonasfj, feel free to ping me.
Adding the test case
'whitespace between line breaks at the end\n \n'totest/string_test.dartcauses tests to fail.diff --git a/test/string_test.dart b/test/string_test.dart index b92c20a..a18f57a 100644 --- a/test/string_test.dart +++ b/test/string_test.dart @@ -10,6 +10,7 @@ final _testStrings = [ "this is a fairly' long string with\nline breaks", 'whitespace\n after line breaks', 'whitespace\n \nbetween line breaks', + 'whitespace between line breaks at the end\n \n', '\n line break at the start', 'word', 'foo bar',Tests failures:
$ dart test 00:01 +400 -1: loading test/golden_test.dart Successfully tested 20 inputs against golden files, created 0 golden files 00:01 +400 -1: test/string_test.dart: Root FOLDED string (4) [E] Assertion failed: (package:yaml_edit) Modification did not result in expected result. # YAML before edit: > # YAML after edit: > >- > whitespace between line breaks at the end > > Please file an issue at: https://github.com/dart-lang/yaml_edit/issues/new?labels=bug package:yaml_edit/src/editor.dart 579:7 YamlEditor._performEdit package:yaml_edit/src/editor.dart 249:14 YamlEditor.update test/string_test.dart 39:20 main.<fn> To run this test again: dart test test/string_test.dart -p vm --plain-name 'Root FOLDED string (4)' 00:01 +415 -2: test/string_test.dart: Root LITERAL string (4) [E] Assertion failed: (package:yaml_edit) Modification did not result in expected result. # YAML before edit: > # YAML after edit: > |- > whitespace between line breaks at the end > > Please file an issue at: https://github.com/dart-lang/yaml_edit/issues/new?labels=bug package:yaml_edit/src/editor.dart 579:7 YamlEditor._performEdit package:yaml_edit/src/editor.dart 249:14 YamlEditor.update test/string_test.dart 39:20 main.<fn> To run this test again: dart test test/string_test.dart -p vm --plain-name 'Root LITERAL string (4)' 00:10 +624 -2: Some tests failed. Consider enabling the flag chain-stack-traces to receive more detailed exceptions. For example, 'dart test --chain-stack-traces'.
Sorry for this. It seems I overwrote it as I was resolving some merge conflicts I encountered while rebasing.
I wanted to drop the chomping indicator changes I made in between for dart-lang/yaml_edit#87 so that the squashed commits make sense when merged.
I’ll work on it 🫡
I have a branch with the changes you suggested, I'll make a PR.
@jonasfj Additionally, I also managed to come up with a PoC that solves this issue permanently (hopefully) but comes with some breaking but solvable trade offs in another branch.
Issue One
After we wrongly tried using keep chomping indicator in dart-lang/yaml_edit#87, it got me wondering why it existed in YamlEditor in the first place. What issue was it trying to solve before my fix in dart-lang/yaml_edit#87?
The investigation led me back here 😅 , to this very issue we are trying to solve. The chomping indicator was meant to preserve the trailing \n in the initially buggy _tryYamlEncodeFolded and _tryYamlEncodeLiteral with hits and misses before my fix. And now it won't work. How come and why?
I investigated yamlEncodeBlockScalar which calls _tryYamlEncodeFolded and any other function that encodes block scalars. It looked okay. You helped me refactor it. The culprit ended up being yamlEncodeBlock which calls yamlEncodeBlockScalar.
If a YamlNode is a :
-
YamlScalar, it callsyamlEncodeBlockScalarand returns the encoded string. -
YamlListorYamlMap, it recursively callsyamlEncodeBlockuntil allYamlScalars are encoded and joins then with a\nand returns the string.
No issue, right? Well, that's where the issue is.
It means our YamlScalar will never be given the chance to have an additional \n we need for _tryYamlEncodeFolded and _tryYamlEncodeLiteral to prevent issues.
Okay, issue found. Just apply the additional \n we need. I did this and quickly ended up having duplicate \n or even \nx where x is the number of YamlList or YamlMap are encountered before the next YamlScalar is found.
So I refactored that function to apply line-breaks correctly. After this 2 issues bubbled up:
- Dangling multiline comments messing with modified yaml!
- Dangling line breaks for
YamlScalars that aren't folded/stripped. (I expected this but not 1 above 😅)
Issue Two: Comments
Once this occurred, I backed off to see if this was a known issue that needed fixing. That's when I found dart-lang/tools#1935. In the issue, YOU outline ways we can/should handle this which ended up being what I had in mind. Thanks! (for the incredible foresight)
However, a few issues you overlooked that ended up being the trade-offs I mentioned:
- What happens to white-space after the comment(s)?
- Do we always look for comments greedily or lazily?
-
lazily- Should we just retain the white space we skipped after scanning ahead for comments and not finding any? -
greedily- Discard any white space we find ahead irrespective of whether we found comments or not.
-
I ended up looking for and skipping comments greedily. At least to have a PoC that works. Armed with this, I continued my investigation started in issue one above.
I noticed that both _replaceInBlockMap and updateInList only look ahead up-to the first comment. While refactoring code here, I realised:
-
yamlEncodeBlockintentionally leaves out the trailing line-break we needed in issue one above and applies it later but only inupdateInListand never for top-level scalars or_replaceInBlockMap. A bug which we can't solve here. Only at theyamlEncodeBlocklevel - I needed to compensate for the indent (belonging to the next node) I greedily consume. I did this using the current
YamlListorYamlMapbeing mutated.
Dangling line-breaks
At this point, the issues I had were:
- How do I prune the additional line breaks without messing up folded and literal strings.
- How do I make sure folded/literal strings don't mess up the next element's indent. I ignored this here and instead solved it in issue_two.
I implemented the function that does that and also added some tricks when adding folded/literal strings.
It would be nice if you are able to review it and make suggestions for improvements. I can make a draft PR.
@jonasfj sorry for the extremely long write up. I didn't want to create a PR without outlining the PoC itself since I make so many changes to the existing code without redefining its intention that may need your review.
Sorry for this. It seems I overwrote it as I was resolving some merge conflicts I encountered while rebasing.
Yeah, the PR was getting pretty complicated, we should probably have done fewer things. No worries, we can still fix bugs now :D
Also it's not like there weren't any bugs before this PR :see_no_evil: :rofl:
Okay, this might take a while to review.
Thinking about it, I do wonder, is it really a good idea to use folded or literal strings without chomping -?
Certainly the + modify is a but dangerous, in scenarios like:
Example 1
A: |+
foo
B: true
- Removing
Bcould change the value ofAif we're not careful.
Example 2
A: |+
foo
- Adding top-level key
B: true, could change the value ofAif we're not careful.
I guess what I'm asking is? Instead of trying to handle cases like these correctly, would it be better if we just fallback to use "foo\n\n".
In practice I suspect it's rare that yaml_edit will be asked to deal with these things. So it might be better if we have a package that can do less, but do it correctly every time.
If we have to fallback to double quoted encoding, that's not nearly as bad as throwing a runtime exception because of an internal error, which is what will happen if we modify the YAML document incorrectly (_performEdit makes sure of that).
I'm just asking the question here:
- How much do we want
yaml_editto do? - Do we think
yaml_editwill be asked to add literal or folded strings often? - Won't most literal or folded strings be trimmed for whitespace in practice?
(And if they are not, would it perhaps not be better that the tools that use
yaml_editfigure out that trimming the strings makeyaml_editproduce nicer looking output)
I guess what I'm asking is? Instead of trying to handle cases like these correctly, would it be better if we just fallback to use
"foo\n\n".
True. That's why I still fielded dart-lang/yaml_edit#91 that should precede dart-lang/yaml_edit#90. dart-lang/yaml_edit#90 is just a PoC that I believe can get better once you take a look and help me nitpick the rough edges.
Certainly the
+modify is a but dangerous, in scenarios like:
That's why I never attempt to use it at any point within the PoC. I discovered a nifty trick that I strongly highlighted and documented. I was surprised it passes most existing tests including random_test.dart since the PoC ensures no dangling comments are left behind.
In practice I suspect it's rare that
yaml_editwill be asked to deal with these things. So it might be better if we have a package that can do less, but do it correctly every time.
Yeah, that makes sense