tablib icon indicating copy to clipboard operation
tablib copied to clipboard

Allow label based indexing in Rows (incl. test updates)

Open ms8r opened this issue 9 years ago • 3 comments

Enables access to the items in a Dataset Row by index or by column header. For example, data[0]['first_name'] == data[0][0] if 'first_name' is the label of the first column as specified in the Dataset's headers. (Ref. issues #22, #158, #265.)

Implemented by adding a Row attribute _dset that stores a reference to the Dataset that "owns" the Row and thus allowing each Row access to the parent Dataset's headers. Constructors, insert methods and itemgetters/setters have been updated accordingly. In addition Dataset has a new attribute _lblidx that indicates whether label based indexing is possible (i.e. header with unique labels exists). _lblidxis maintained via updated headers property.

To allow label based access within a Row the Dataset's __getitem__ now returns a Row rather than a tuple, with the Row basically behaving like a list externally. This has the potential to cause some backwards compatibility issues if client code relied on Dataset items being returned as plain tuples. To minimize this impact the PR adds __add__, __eq__, and __ne__ methods for Rows. Tests have been updated by applying the Row.tuple property for comparisons with tuple literals (PR will fail existing tests otherwise). Independent of the label based indexing I'd suggest returning Dataset items as Rows instead of plain tuples may be preferable in any case to enable adding additional functionality in the future.

Other changes/additions:

  • Add copy method for Datasets that updates _dset references in new object's Rows and uses copy.deepcopy instead of copy.copy. This should also fix a bug in the current version where copies (in filterand stack) are shallow and the new object's _data attribute points to the same list as the original object (filter and stack updated accordingly).
  • Add assertions to existing tests for methods that return new Dataset objects to verify that Row's _dset points to the new object and that the new object is not a shallow copy (filter, stack, stack_col, subset, sorted, and transpose)
  • Add tests for new functionality (plus one for existing filter)

ms8r avatar Jan 01 '17 23:01 ms8r

Can you please resolve the conflicts. Thanks 🎉

timofurrer avatar Mar 02 '19 11:03 timofurrer

Done ;-) This also surfaced a left over bug in the has_tag method (incorrect unicode handling under Python 2.7.... time to move to Python 3 only...

ms8r avatar Mar 17 '19 10:03 ms8r