tablib library re-orders columns when exporting content in YAML

Hello,

I've been using tablib indirectly through records to produce some Excel reports from a database, and I've noticed that while I add my content in a specific order, when exporting, the order of the columns is not the same as I added.

Looking at the documentation of tablib -- if the examples are not merely illustrative -- this should be noticeable, as for instance, when exporting to JSON, YAML, and CSV, the output "columns" are re-ordered differently for each format:

>>> print data.json
[
  {
    "last_name": "Adams",
    "age": 90,
    "first_name": "John"
  },
  {
    "last_name": "Ford",
    "age": 83,
    "first_name": "Henry"
  }
]

>>> print data.yaml
- {age: 90, first_name: John, last_name: Adams}
- {age: 83, first_name: Henry, last_name: Ford}

>>> print data.csv
first_name,last_name,age
John,Adams,90
Henry,Ford,83

I'd appreciate if this could be fixed in such way that output would always have the same structure as what was inputted.

Thanks

Apr 22 '17 17:04 PauloPhagula

I'm also facing this issue. I'm using a list to explicitly set headers, then I import a list of dicts. column ordering is broken after I export to xlsx, header names are also forgotten.

May 31 '17 12:05 Cediddi

this should not be the case!

Jun 04 '17 15:06 kennethreitz

I guess when I import dict, headers got overriden and thus randomized. As a workaround I changed dict to flat tuples. This way there's no keys in the data and it's perfectly ordered. I can provide a minimal example for both.

Jun 04 '17 16:06 Cediddi

great!

Jun 05 '17 16:06 kennethreitz

I couldn't reproduce using these examples from the README:

Python 3.8.1 (v3.8.1:1b293b6006, Dec 18 2019, 14:08:53)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tablib
>>> tablib.__version__
'1.0.0'
>>> data = tablib.Dataset()
>>> names = ['Kenneth Reitz', 'Bessie Monke']
>>>
>>> for name in names:
...     fname, lname = name.split()
...     data.append([fname, lname])
...
>>> data.dict
[['Kenneth', 'Reitz'], ['Bessie', 'Monke']]
>>> data.headers = ['First Name', 'Last Name']
>>> data.dict
[OrderedDict([('First Name', 'Kenneth'), ('Last Name', 'Reitz')]), OrderedDict([('First Name', 'Bessie'), ('Last Name', 'Monke')])]
>>> data.append_col([22, 20], header='Age')
>>> data.dict
[OrderedDict([('First Name', 'Kenneth'), ('Last Name', 'Reitz'), ('Age', 22)]), OrderedDict([('First Name', 'Bessie'), ('Last Name', 'Monke'), ('Age', 20)])]
>>> data.export('csv')
'First Name,Last Name,Age\r\nKenneth,Reitz,22\r\nBessie,Monke,20\r\n'
>>> data.export('json')
'[{"First Name": "Kenneth", "Last Name": "Reitz", "Age": 22}, {"First Name": "Bessie", "Last Name": "Monke", "Age": 20}]'
>>> data.export('yaml')
'- {Age: 22, First Name: Kenneth, Last Name: Reitz}\n- {Age: 20, First Name: Bessie, Last Name: Monke}\n'

It might have been a Python 2-only problem.

If it's still a problem with Python 3, please include a reproducible snippet of code, along with the Python and tablib versions.

Feb 12 '20 13:02 hugovk

Please have a look at the image below with a snippet from the example on the docs.

Mind how the order for json and yaml is different.

Also note that this is while using Python 3.

Screenshot from 2020-02-12 17-34-01

Feb 12 '20 15:02 PauloPhagula

Thanks, now I see, the age column is in a different place with YAML (they're in alphabetical order).

Feb 12 '20 16:02 hugovk

This is specific to YAML. Traditionally, PyYAML was always sorting keys alphabetically. Only recently (https://github.com/yaml/pyyaml/pull/254, committed March 2019) did PyYAML offer the ability to opt out this key sorting. So now we should be able to add the sort_keys=False parameter to our usage of yaml.safe_dump, at the condition we also require PyYAML >= 5.1.

Feb 12 '20 18:02 claudep

tablib tablib copied to clipboard

library re-orders columns when exporting content in YAML

tablib
tablib copied to clipboard