pyyaml
pyyaml copied to clipboard
Why does pyyaml disallow trailing spaces in block scalars?
Block style (|, <) is overruled with quoted style (") if the string contains trailing spaces. I don't see any reason for this, and the fact that I can't force a certain style (even it means losing information) has caused me a lot of grief.
I agree that trailing spaces shouldn't be a reason to disallow block scalars. Can't say what changes are needed to allow this, though. If we could integrate https://github.com/yaml/yaml-test-suite and check that parsing the output again returns the same parsing events we could make sure that we don't break anything.
@oulenz, can you provide a small code sample of where this is causing you grief?
The emitter should never lose information. It has to check that the data can be expressed in the style requested (or make the best (yes, opinionated) choice, when no style is requested).
Trailing whitespace would be collapsed in folded < style, and is not visible in literal |. I can see why this choice was made.
I'd like to see a bit of what you are doing to see if it is worth the trouble to allow trailing spaces in (explicitly requested) literal.
@ingydotnet I agree loss of information is not an option.
For literal | block style it's easy to imagine use cases IMHO.
If you have text data in a specific format that requires trailing spaces, you still might want to dump it as a block scalar for readability.
some text format: |
line 1___
line 2
line 3
Folded style is used when you have long lines. Imagine you have a long input line, with words seperated by different amount of spaces.
# _ == trailing spaces
foo: >
xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx____
xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx____
xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx____
xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx____
The emitter now tries to break this up into several lines to get below a limit of characters per line.
I think there are only two possibilities to to that, double quoted and folded block. libyaml, pyyaml and ruamel all emit this as a double quoted scalar, overruling the requested block scalar style:
foo: "xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx
\ xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx
\ xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx \n"
Which I think is not very readable. The only advantage is that there are no trailing spaces,.
btw, trailing spaces in folded block are not always folded, for example if an empty line follows:
foo: >
some words___
more words
@perlpunk I'm aware of how YAML works in these cases. I was asking @oulenz to show his actual use cases and code so that other solutions might be found.
imho, this is a pretty rare edge case, and it would require patching pyyaml and libyaml, and so it is likely a low priority item.
@ingydotnet , I was trying to give examples, to every reader of this issue, where I can understand the wish for block style to be preserved. I was not writing this to imply that you don't know how it works.
Thank you @perlpunk for these examples, they perfectly illustrate what I mean, and thank you both for looking into this.
My use case was that I had a large number of dictionaries that I wanted to store as yaml records, in | block style. Some of these happened to have trailing spaces and were output wrongly, as a double quoted scalar. This was frustrating because
- I had to reverse-engineer why this was happening, especially since I couldn't find any reason in the yaml spec why my strings shouldn't be representable in
|block style. This really boggles the mind: pyyaml is needlessly deviating from the yaml spec. So this is a bug, that should be fixed. - I was explicitly telling the emitter that I wanted
|block style. The way every other Python function that I know of works is that the option I set is obeyed, possibly leading to warnings and/or errors, and that the consequences (like losing information in edge cases) are for me to deal with. This is programming, I should be able to control things. For pyyaml to override my option due to style considerations really is not acceptable.
My 'solution' was to strip trailing spaces before emitting so I would get | block style across the board, so ironically, this led me to lose information I wouldn't have lost otherwise.
Why there is no traction on this. This seems to be what i was also facing, as @oulenz mentioned, spend hell lot of time identifying why its causing, while the yaml was correctly indented.
Identifying that this was the issue I was facing just killed a few hours for me.
At the very least, can pyyaml please throw a warning that says “cannot use style | on content with trailing spaces”
I wasted a WHOLE DAY because of this bug.. why is this issue still open? It's been years...
Another several hours for the bug..
Any update on this? I moved to ruamel.yaml for writing to avoid this issue but fall back to pyyaml for reading since CLoader is way (!!!) faster
Explicit is better than implicit and the behaviour is implicit. 😭
I'm another person who spent a day trying to work out why my long strings weren't being styled as block literals -.-'
Its happening the same to me, I am trying to modify and Azure pipeline automatically and this behaivor screws the format of:
script: | echo "this is a test"
It works for me:
import yaml
def yaml_multiline_string_pipe(dumper, data):
text_list = [line.rstrip() for line in data.splitlines()]
fixed_data = "\n".join(text_list)
if len(text_list) > 1:
return dumper.represent_scalar('tag:yaml.org,2002:str', fixed_data, style="|")
return dumper.represent_scalar('tag:yaml.org,2002:str', fixed_data)
yaml.add_representer(str, yaml_multiline_string_pipe)
print(yaml.dump({"multiline":"First line \r\nSecond line\r\nThird line"}, allow_unicode=True))
multiline: |-
First line
Second line
Third line
@rfsantanna nice.
I'd like to soon introduce a yaml.Config class for both loading and dumping. In this case you'd simply be able to provide a mapping of regex to preferred scalar dumping style.
It works for me:
import yaml def yaml_multiline_string_pipe(dumper, data): text_list = [line.rstrip() for line in data.splitlines()] fixed_data = "\n".join(text_list) if len(text_list) > 1: return dumper.represent_scalar('tag:yaml.org,2002:str', fixed_data, style="|") return dumper.represent_scalar('tag:yaml.org,2002:str', fixed_data) yaml.add_representer(str, yaml_multiline_string_pipe) print(yaml.dump({"multiline":"First line \r\nSecond line\r\nThird line"}, allow_unicode=True))multiline: |- First line Second line Third line
This is not a solution as it does not preserve the spaces. You cannot round-trip this!
I also just spent several hours trying to figure this out. The behavior of the current pyyaml is thus broken for my use case, because I need to preserve trailing spaces on line endings in a literal.
If I am specifying to use a block scalar, then I want to use a block scalar, trailing spaces and all.
Is there any update to this? Using markdown inside of YAML is broken when you can't have trailing spaces (trailing spaces in markdown before a newline means <br> as opposed to <p>)
Same issue here. Lost many hours. Didn't find any solution
I think we're all agreed this is an old implementation issue, not a spec issue- I'm willing to spend some time trying to make it work in the next release, but as @perlpunk mentioned earlier, I'd want to integrate the more comprehensive YAML test suite first to have some confidence we didn't break anything in the process.
I've added this issue to the PyYAML 6.1 planning project.