pyyaml icon indicating copy to clipboard operation
pyyaml copied to clipboard

Why does pyyaml disallow trailing spaces in block scalars?

Open oulenz opened this issue 7 years ago • 26 comments

We do not permit trailing spaces for block scalars.

Block style (|, <) is overruled with quoted style (") if the string contains trailing spaces. I don't see any reason for this, and the fact that I can't force a certain style (even it means losing information) has caused me a lot of grief.

oulenz avatar Jan 16 '18 10:01 oulenz

I agree that trailing spaces shouldn't be a reason to disallow block scalars. Can't say what changes are needed to allow this, though. If we could integrate https://github.com/yaml/yaml-test-suite and check that parsing the output again returns the same parsing events we could make sure that we don't break anything.

perlpunk avatar Apr 01 '18 18:04 perlpunk

@oulenz, can you provide a small code sample of where this is causing you grief?

The emitter should never lose information. It has to check that the data can be expressed in the style requested (or make the best (yes, opinionated) choice, when no style is requested).

Trailing whitespace would be collapsed in folded < style, and is not visible in literal |. I can see why this choice was made.

I'd like to see a bit of what you are doing to see if it is worth the trouble to allow trailing spaces in (explicitly requested) literal.

ingydotnet avatar Apr 01 '18 21:04 ingydotnet

@ingydotnet I agree loss of information is not an option.

For literal | block style it's easy to imagine use cases IMHO. If you have text data in a specific format that requires trailing spaces, you still might want to dump it as a block scalar for readability.

some text format: |
    line 1___
    line 2
    line 3

Folded style is used when you have long lines. Imagine you have a long input line, with words seperated by different amount of spaces.

# _ == trailing spaces
foo: >
  xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx____
  xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx____
  xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx____
  xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx____

The emitter now tries to break this up into several lines to get below a limit of characters per line.

I think there are only two possibilities to to that, double quoted and folded block. libyaml, pyyaml and ruamel all emit this as a double quoted scalar, overruling the requested block scalar style:

foo: "xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx     xxxxxxxxxx    xxxxxxxxxx
  \   xxxxxxxxxx    xxxxxxxxxx     xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx
  \    xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx    \n"

Which I think is not very readable. The only advantage is that there are no trailing spaces,.

perlpunk avatar Apr 02 '18 13:04 perlpunk

btw, trailing spaces in folded block are not always folded, for example if an empty line follows:

foo: >
  some words___

  more words

perlpunk avatar Apr 02 '18 13:04 perlpunk

@perlpunk I'm aware of how YAML works in these cases. I was asking @oulenz to show his actual use cases and code so that other solutions might be found.

imho, this is a pretty rare edge case, and it would require patching pyyaml and libyaml, and so it is likely a low priority item.

ingydotnet avatar Apr 02 '18 17:04 ingydotnet

@ingydotnet , I was trying to give examples, to every reader of this issue, where I can understand the wish for block style to be preserved. I was not writing this to imply that you don't know how it works.

perlpunk avatar Apr 02 '18 17:04 perlpunk

Thank you @perlpunk for these examples, they perfectly illustrate what I mean, and thank you both for looking into this.

My use case was that I had a large number of dictionaries that I wanted to store as yaml records, in | block style. Some of these happened to have trailing spaces and were output wrongly, as a double quoted scalar. This was frustrating because

  1. I had to reverse-engineer why this was happening, especially since I couldn't find any reason in the yaml spec why my strings shouldn't be representable in | block style. This really boggles the mind: pyyaml is needlessly deviating from the yaml spec. So this is a bug, that should be fixed.
  2. I was explicitly telling the emitter that I wanted | block style. The way every other Python function that I know of works is that the option I set is obeyed, possibly leading to warnings and/or errors, and that the consequences (like losing information in edge cases) are for me to deal with. This is programming, I should be able to control things. For pyyaml to override my option due to style considerations really is not acceptable.

My 'solution' was to strip trailing spaces before emitting so I would get | block style across the board, so ironically, this led me to lose information I wouldn't have lost otherwise.

oulenz avatar Apr 03 '18 08:04 oulenz

Why there is no traction on this. This seems to be what i was also facing, as @oulenz mentioned, spend hell lot of time identifying why its causing, while the yaml was correctly indented.

ysaakpr avatar Oct 29 '19 08:10 ysaakpr

Identifying that this was the issue I was facing just killed a few hours for me.

At the very least, can pyyaml please throw a warning that says “cannot use style | on content with trailing spaces”

andrewasheridan avatar Feb 20 '20 11:02 andrewasheridan

I wasted a WHOLE DAY because of this bug.. why is this issue still open? It's been years...

iameugenejo avatar Aug 29 '20 04:08 iameugenejo

Another several hours for the bug..

libreliu avatar Sep 03 '20 18:09 libreliu

Any update on this? I moved to ruamel.yaml for writing to avoid this issue but fall back to pyyaml for reading since CLoader is way (!!!) faster

Jwink3101 avatar Oct 02 '20 22:10 Jwink3101

Explicit is better than implicit and the behaviour is implicit. 😭

cosven avatar Nov 23 '20 07:11 cosven

I'm another person who spent a day trying to work out why my long strings weren't being styled as block literals -.-'

johnhaire89 avatar Jun 18 '21 05:06 johnhaire89

Its happening the same to me, I am trying to modify and Azure pipeline automatically and this behaivor screws the format of:

script: | echo "this is a test"

maragunde93 avatar Sep 24 '21 17:09 maragunde93

It works for me:

import yaml

def yaml_multiline_string_pipe(dumper, data):
    text_list = [line.rstrip() for line in data.splitlines()]
    fixed_data = "\n".join(text_list)
    if len(text_list) > 1:
        return dumper.represent_scalar('tag:yaml.org,2002:str', fixed_data, style="|")
    return dumper.represent_scalar('tag:yaml.org,2002:str', fixed_data)

yaml.add_representer(str, yaml_multiline_string_pipe)
print(yaml.dump({"multiline":"First line         \r\nSecond line\r\nThird line"}, allow_unicode=True))
multiline: |-
  First line
  Second line
  Third line

rfsantanna avatar Jan 21 '22 02:01 rfsantanna

@rfsantanna nice.

I'd like to soon introduce a yaml.Config class for both loading and dumping. In this case you'd simply be able to provide a mapping of regex to preferred scalar dumping style.

ingydotnet avatar Jan 25 '22 11:01 ingydotnet

It works for me:

import yaml

def yaml_multiline_string_pipe(dumper, data):
    text_list = [line.rstrip() for line in data.splitlines()]
    fixed_data = "\n".join(text_list)
    if len(text_list) > 1:
        return dumper.represent_scalar('tag:yaml.org,2002:str', fixed_data, style="|")
    return dumper.represent_scalar('tag:yaml.org,2002:str', fixed_data)

yaml.add_representer(str, yaml_multiline_string_pipe)
print(yaml.dump({"multiline":"First line         \r\nSecond line\r\nThird line"}, allow_unicode=True))
multiline: |-
  First line
  Second line
  Third line

This is not a solution as it does not preserve the spaces. You cannot round-trip this!

Jwink3101 avatar Feb 17 '22 18:02 Jwink3101

I also just spent several hours trying to figure this out. The behavior of the current pyyaml is thus broken for my use case, because I need to preserve trailing spaces on line endings in a literal.

If I am specifying to use a block scalar, then I want to use a block scalar, trailing spaces and all.

nsheff avatar Aug 09 '22 00:08 nsheff

Is there any update to this? Using markdown inside of YAML is broken when you can't have trailing spaces (trailing spaces in markdown before a newline means <br> as opposed to <p>)

Jwink3101 avatar May 09 '23 16:05 Jwink3101

Same issue here. Lost many hours. Didn't find any solution

jonesbusy avatar Jul 18 '23 08:07 jonesbusy

I think we're all agreed this is an old implementation issue, not a spec issue- I'm willing to spend some time trying to make it work in the next release, but as @perlpunk mentioned earlier, I'd want to integrate the more comprehensive YAML test suite first to have some confidence we didn't break anything in the process.

I've added this issue to the PyYAML 6.1 planning project.

nitzmahone avatar Aug 23 '23 16:08 nitzmahone