pyyaml icon indicating copy to clipboard operation
pyyaml copied to clipboard

Incorrect indentation with lists

Open bcoca opened this issue 5 years ago • 25 comments

When using indentation this seems to be applied to the value of the list instead of to the list itself, as you can see below indent=4 is applied after the leading - and not to the list itself.

>>> print(yaml.dump(data['vars']['yaml'], indent=4, allow_unicode=True, default_flow_style=False))
list_of_dict_attr:
-   attr1: value1
    attr2: value2
    attr3:
    - item1
    - item2
single_attr: value1
>>> print(yaml.dump(data['vars']['yaml'], indent=2, allow_unicode=True, default_flow_style=False))
list_of_dict_attr:
- attr1: value1
  attr2: value2
  attr3:
  - item1
  - item2
single_attr: value1

original issue https://github.com/ansible/ansible/issues/48865

bcoca avatar Nov 20 '18 21:11 bcoca

@bcoca can you please check if this is in the Python emitter code or if it's libyaml (ie CDumper) (or both)?

ingydotnet avatar Nov 20 '18 21:11 ingydotnet

@ingydotnet Testing with Dumper=yaml.Dumper and Dumper=yaml.CDumper seems to produce the same result.

sivel avatar Nov 20 '18 22:11 sivel

Thanks @sivel. This will need to be fixed in both. Patches welcome :)

ingydotnet avatar Nov 21 '18 01:11 ingydotnet

It seems like the problem is occurring because self.indention is set to False in Emitter when expect_block_sequence is run, which makes the self.increase_indent code do nothing.

znd4 avatar Apr 02 '19 23:04 znd4

@zdog234 i've tried myself the below and didn't make any difference

    def expect_block_sequence(self):                  
        indentless = (self.mapping_context and not self.indention)
        self.increase_indent(flow=True, indentless=indentless)                  
        self.state = self.expect_first_block_sequence_item

or

    def expect_block_sequence(self):                  
        indentless = (self.mapping_context and not self.indention)
        self.increase_indent(flow=True)                  
        self.state = self.expect_first_block_sequence_item

DanyC97 avatar May 15 '19 14:05 DanyC97

@ingydotnet @perlpunk any chance you can shed some light and i may be able to fire a PR ?

DanyC97 avatar May 15 '19 14:05 DanyC97

Well, I wouldn't call that behaviour incorrect. I guess it's a matter of taste, and I can find arguments proving that it's consistent. What I'm also missing in this issue is the expected correct behaviour.

Let's look at both examples:

--- # spaces = 4
list_of_dict_attr:
-   attr1: value1
    attr2: value2
    attr3:
    - item1
    - item2
single_attr: value1
--- # spaces = 2
list_of_dict_attr:
- attr1: value1
  attr2: value2
  attr3:
  - item1
  - item2
single_attr: value1

The top level mapping has an indentation of zero (0 * spaces). The value for list_of_dict_attr, the sequence, also has an indentation of zero, because PyYAML chooses zero-indented sequences always. That's why the dashes have no indentation in both cases. If it chooses zero-indentation, it simply does not depend on the number of spaces you configured.

The value of the first sequence item, the mapping attr1: ..., has an indentation of 1 * spaces (respectively 4 or 2). The sequence under attr3 is zero-indented again, so 1 * spaces. The items of this sequence are on the same line, so they don't get any indentation.

I assume you would expect this instead?

--- # spaces = 4
list_of_dict_attr:
  - attr1: value1
    attr2: value2
    attr3:
      - item1
      - item2
single_attr: value1

perlpunk avatar Jun 02 '19 12:06 perlpunk

I can't speak for @DanyC97 but in my opinion yes, that last snippet is what is expected when passing spaces=4.

kislyuk avatar Jun 24 '19 22:06 kislyuk

From my point of view, the most widely-accepted indentation style for sequences is the one used multiple times in the official YAML specification. For instance, in section 2.1, example 2.3 looks like this:

american:
  - Boston Red Sox
  - Detroit Tigers
  - New York Yankees
national:
  - New York Mets
  - Chicago Cubs
  - Atlanta Braves

The question is whether tools like pyyaml should render sequences in such a way for indentation of size 4 or for indentation of size 2.

I would argue that it seems incorrect to render sequences in such a way for indentation of size 4, because other items would visually appear to be indented more:

mapping:
    one: 1
    two: 2
list:
  - 1
  - 2

Therefore, I think that it is more appropriate to render sequences in such a way for indentation of size 2:

mapping:
  one: 1
  two: 2
list:
  - 1
  - 2

That being said, someone may prefer to not indent sequence items to a level that is visually similar to the indentation level of the other items. That is a fair requirement, but in order to fully support it, there would have to be a separate configuration option for indentation size of sequences.

pbasista avatar Jun 28 '19 15:06 pbasista

I agree with @pbasista about what the output should look like, that'll be the same behavior that yamllint using, and maybe a separate configuration option would be a solution for both people want/like it or not.

I'm currently facing yaml file generated by pyyaml not being accepted by yamllint because of the indent of the lists.

PeterDaveHello avatar Jul 01 '19 13:07 PeterDaveHello

there would have to be a separate configuration option for indentation size of sequences.

From my point of view that would be the best, because IMHO currently block sequences are simple not indented at all (at least when they are a value of a block mapping), independent of the indent option.

perlpunk avatar Jul 01 '19 13:07 perlpunk

Is there any progress on this? Right now the workaround here is working for me https://stackoverflow.com/questions/25108581/python-yaml-dump-bad-indentation

cortex3 avatar Oct 25 '19 10:10 cortex3

It seems the spec varies the output. The Preview section shows sequences indented from the key.

Example 2.3. Mapping Scalars to Sequences (ball clubs in each league)

american:
  - Boston Red Sox
  - Detroit Tigers
  - New York Yankees
national:
  - New York Mets
  - Chicago Cubs
  - Atlanta Braves

However the Failsafe Schema is indeed what pyyaml is doing:

10.1. Failsafe Schema The failsafe schema is guaranteed to work with any YAML document. It is therefore the recommended schema for generic YAML tools. A YAML processor should therefore support this schema, at least as an option. ... 10.1.1.2. Generic Sequence URI: tag:yaml.org,2002:seq

Kind: Sequence.

Definition: Represents a collection indexed by sequential integers starting with zero. Example bindings to native types include Perl’s array, Python’s list or tuple, and Java’s array or Vector.

Example 10.2. !!seq Examples

Block style: !!seq
- Clark Evans
- Ingy döt Net
- Oren Ben-Kiki

Flow style: !!seq [ Clark Evans, Ingy döt Net, Oren Ben-Kiki ]

Personally I prefer the indented format and it would be nice if pyyaml supported it as an option but the code isn't doing anything wrong without the indents even if yamllint disagrees.

jagibson avatar Feb 27 '20 06:02 jagibson

Just ran into this myself in some example config generation I am doing, and it makes some of my exports rather weird.

The indented list structure does seem more common, and it would be nice if pyyaml supported it.

Edit: I've now also temporarily solved it with https://stackoverflow.com/questions/25108581/python-yaml-dump-bad-indentation.

unexceptable avatar Aug 07 '20 04:08 unexceptable

Also running into this via ansible. It would be nice if the indentation were consistent.

kaysond avatar Oct 19 '20 21:10 kaysond

The workaround mentioned above:

class Dumper(yaml.Dumper):
    def increase_indent(self, flow=False, *args, **kwargs):
        return super().increase_indent(flow=flow, indentless=False)

print(yaml.dump(data, Dumper=Dumper))
Gateways:
  - 14
  - 4
  - 18

andreif avatar Jan 23 '21 09:01 andreif

I just use prettier as a pre-commit hook and it takes care of making yaml look good.

yajo avatar Jan 26 '21 08:01 yajo

The workaround mentioned above:

class Dumper(yaml.Dumper):
    def increase_indent(self, flow=False, *args, **kwargs):
        return super().increase_indent(flow=flow, indentless=False)

print(yaml.dump(data, Dumper=Dumper))

Unfortunately, it does not works with CDumper. And when working with a lots of yaml or with big yaml, I'd rather have no indentation than having a "good looking" one but much much slower generator.

Jean-Daniel avatar Feb 25 '21 16:02 Jean-Daniel

I just use prettier as a pre-commit hook and it takes care of making yaml look good.

I tried using prettier, but it made changes that both I and YamlLint disagreed with.

pillarsdotnet avatar Feb 25 '21 23:02 pillarsdotnet

another year 2022 coming, is there an easy way to resolve this issue?

xkungfu avatar Jan 13 '22 03:01 xkungfu

another year 2022 coming, is there an easy way to resolve this issue?

https://github.com/yaml/pyyaml/issues/234#issuecomment-765894586

This worked for me. It is quite easy to implement also.

Acidherr avatar May 20 '22 06:05 Acidherr

Sill not work, will this be fixed?

my code

def test_demo(self):
    data = {
        "name": "John",
        "age": 30,
        "city": "New York",
        "haha": ["aaaa", "bbbb"]
    }

    # 设置 indent 和 default_flow_style 参数
    output = yaml.dump(data, Dumper=Dumper,  sort_keys=False,indent=2)
    print(output)

result is

age: 30
city: New York
haha:
- aaaa
- bbbb

what I expected is

age: 30
city: New York
haha:
  - aaaa
  - bbbb

terwer avatar Apr 04 '23 07:04 terwer

Found a temp solution

https://stackoverflow.com/a/39681672/4037224

terwer avatar Apr 04 '23 07:04 terwer

@Acidherr

#234 (comment)

This worked for me. It is quite easy to implement also.

What in the words indent=4 is so hard to understand? No it doesn't work with indent=4, in 2023.

pkit avatar Jul 17 '23 01:07 pkit

@pkit I know you're frustrated about the slow progress on this issue - many of us are - but please do not take your frustration out on your fellow commenters and contributors. By doing so you reduce trust and diminish the quality of all open source projects related to this one.

kislyuk avatar Jul 17 '23 16:07 kislyuk