tablib icon indicating copy to clipboard operation
tablib copied to clipboard

Initialize a Dataset from plain python dicts

Open JWCook opened this issue 3 years ago • 3 comments

I'd like to be able to create a Dataset from plain python dicts. I read through just about everything in the docs and some of the code, and can't seem to find a way to do this directly.

What I'd like to be able to do is something like:

source_data = [
     {'column_1': 'value_1A',  'column_2':  'value_2A'},
     {'column_1': 'value_1B',  'column_2':  'value_2B'},
]
data = Dataset().load(*source_data)

Or maybe:

data = import_set(source_data, format='dict')

Current workarounds:

  • json.dumps the data first before loading
  • Register a custom format class
  • Separately add headers and row values:
    data = DataSet()
    data.headers = source_data[0].keys()
    data.extend([item.values() for item in source_data])
    

So I guess my questions are:

  • Is there currently a better way to do this?
  • If not, would you be interested in a PR for this?

JWCook avatar May 18 '21 00:05 JWCook

data = DataSet()
data.headers = source_data[0].keys()
data.extend([item.values() for item in source_data])

I'll note that this only works if your dicts have all of the keys, which a lot of data doesn't. It's a waste to serialize every blank value as None so many applications will not contain every possible key.

I'd like to see movement on this as well and am willing to start developing on it. My current solution is to pass over the dataset a few times. Once to collect all the keys and gather the rows. Then again to build rows from the full set of headers and append to the Dataset.

It would be really nice if append could operate on a dict, adding a new column automatically, setting all previous values to blank, when encountered.

brandonrobertz avatar Oct 07 '21 01:10 brandonrobertz

An update: I've been working through this and in the code there's references to the necessity of rows aligning, etc, but I'm not sure if this is strictly required by the validation checks or if there's something deeper in the functionality of appending rows that requires this. Some guidance from a dev would be helpful here!

brandonrobertz avatar Nov 07 '21 22:11 brandonrobertz

Same problem/question here. My dicts have all of the keys, just like in the @JWCook's example. That's why I've been able to use:

source_data = [
     {'column_1': 'value_1A',  'column_2':  'value_2A'},
     {'column_1': 'value_1B',  'column_2':  'value_2B'},
]
data = Dataset()
data.dict = source_data

I may also override the column names with:

data.headers = ["Column 1","Column 2"]

This works only if every item in the dict has all of the keys. If not, a tablib.exceptions.InvalidDimensions exception is raised.

daghemo avatar Jan 10 '22 10:01 daghemo