csv-diff icon indicating copy to clipboard operation
csv-diff copied to clipboard

Composite key

Open DavidUnderdown opened this issue 6 years ago • 5 comments

This looks like it will be really useful, thanks.

When we were working on CSV Schema Language we found it necessary to allow uniqueness to be defined over a composite set of columns (the unique column rule in the schema). I can see from the code structure that this wouldn't necessarily be entirely straightforward here, but I think it would be useful.

DavidUnderdown avatar Mar 14 '19 09:03 DavidUnderdown

This doesn't seem impossibly difficult to add... it could work by allowing users to specify the --key option multiple times

$ csv-diff one.csv two.csv --key=id --key=secondary

Part of the work would be teaching the CSV loading function to work with compound keys and create the internal ID as a tuple of values:

https://github.com/simonw/csv-diff/blob/825a28ccfdc20d011373b57b264970113df64872/csv_diff/init.py#L10-L15

The human_text() function would then need to learn how to display a compound ID.

simonw avatar Apr 07 '19 20:04 simonw

Thanks Simon,

I must admit that I was forgetting that our CSVs do typically have a URI per row too which is unique, so we could use that for purposes of getting a diff. May still be useful for others though.

For human_text(), perhaps some way of passing in a formatting string? In Python terms we'd want something likef"{r['lettercode']} {r['series']}/{r['piece']}/{r['item']} image {r['ordinal']}"

DavidUnderdown avatar Apr 08 '19 08:04 DavidUnderdown

Seconding this, I'd find this very useful. As best as I can find, there are no other similar libraries that allow for composite keys, but I already use and am very happy with this package.

maxwelllc avatar Oct 06 '20 17:10 maxwelllc

Hi @simonw I've made progress on this feature and would like to share. Would you grant me write access?

puddleoasis avatar Aug 16 '23 19:08 puddleoasis

@puddleoasis why not submit a PR?

patric-r avatar Aug 16 '23 20:08 patric-r