pyyaml icon indicating copy to clipboard operation
pyyaml copied to clipboard

Issue with default_style and lines that include \n

Open DeeeeLAN opened this issue 3 years ago • 1 comments

Hi, I am facing an issue with strings which include newline characters getting formatted poorly. Here is an example that hopefully illustrates what I am looking for:

test = {}
test['test'] = 'This is a really long line that will get split into multiple lines when using folded style. With literal style, it will stay a single line'
t = yaml.dump(test, default_style='|')
print(t)
t = yaml.dump(test, default_style='>')
print(t)

test['test'] = 'This is a really long line that will get split \ninto multiple lines when using folded style. With literal style, it will stay a single line'
t = yaml.dump(test, default_style='|')
print(t)
t = yaml.dump(test, default_style='>')
print(t)

produces

"test": |-
  This is a really long line that will get split into multiple lines when using folded style. With literal style, it will stay a single line

"test": >-
  This is a really long line that will get split into multiple lines when using folded
  style. With literal style, it will stay a single line

"test": "This is a really long line that will get split \ninto multiple lines when\
  \ using folded style. With literal style, it will stay a single line"

"test": "This is a really long line that will get split \ninto multiple lines when\
  \ using folded style. With literal style, it will stay a single line"

What I want from the second one is:

"test": |-
  This is a really long line that will get split 
  into multiple lines when using folded style. With literal style, it will stay a single line

Sorry if I mixed up the terminology in my example!

DeeeeLAN avatar Jun 03 '21 04:06 DeeeeLAN

It looks like if I double escape the \n, it at least preserves the style format, although ignores the newline completely:

  test['test'] = 'This is a really long line that will get split \\ninto multiple lines when using folded style. With literal style, it will stay a single line'
  t = yaml.dump(test, default_style='|')
  print(t)
  t = yaml.dump(test, default_style='>')
  print(t)

Result:

"test": |-
  This is a really long line that will get split \ninto multiple lines when using folded style. With literal style, it will stay a single line

"test": >-
  This is a really long line that will get split \ninto multiple lines when using
  folded style. With literal style, it will stay a single line

DeeeeLAN avatar Jun 03 '21 20:06 DeeeeLAN

You need to strip the space before the newline. I believe this is the code in the Emitter that is responsible for this.

This can be done by adding.

In [1]: import yaml
   ...: 
   ...: def yaml_multiline_string_presenter(dumper, data):
   ...:   if len(data.splitlines()) > 1:
   ...:     data = '\n'.join([line.rstrip() for line in data.strip().splitlines()])
   ...:     return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
   ...:   return dumper.represent_scalar('tag:yaml.org,2002:str', data)
   ...: yaml.add_representer(str, yaml_multiline_string_presenter)


In [2]: test = {}
   ...: test['test'] = 'This is a really long line that will get split into multiple lines when using folded style. With literal style, it will stay a single line'
   ...: t = yaml.dump(test, default_style='|')
   ...: print(t)
   ...: t = yaml.dump(test, default_style='>')
   ...: print(t)
   ...: 
   ...: test['test'] = 'This is a really long line that will get split \ninto multiple lines when using folded style. With literal style, it will stay a single line'
   ...: t = yaml.dump(test, default_style='|')
   ...: print(t)
   ...: t = yaml.dump(test, default_style='>')
   ...: print(t)

Results

"test": |-
  This is a really long line that will get split into multiple lines when using folded style. With literal style, it will stay a single line

"test": >-
  This is a really long line that will get split into multiple lines when using folded
  style. With literal style, it will stay a single line

"test": |-
  This is a really long line that will get split
  into multiple lines when using folded style. With literal style, it will stay a single line

"test": |-
  This is a really long line that will get split
  into multiple lines when using folded style. With literal style, it will stay a single line

jasontrost-databricks avatar Oct 24 '22 16:10 jasontrost-databricks