csv-diff icon indicating copy to clipboard operation
csv-diff copied to clipboard

Add support for multi-column --key values

Open jsvine opened this issue 4 years ago • 2 comments

These modifications allow users to pass multiple (comma-separated) columns as the --key, for scenarios in which rows are uniquely identified by a combination of columns — for instance, the county and the state. For instance:

csv-diff --key=state,county a.csv b.csv

An arbitrary number of columns can be used. These scenarios are fairly common, in my experience.

I aimed to make this implementation as simple as possible. As such, it doesn't handle one particular edge case: columns whose names contain a comma. My instinct is that this could be handled by adding a --key-sep option, in which the user could pass any arbitrary string to serve as a separator. E.g.,:

csv-diff --key="Column Name, With A Comma::Column 2" --key-sep="::" a.csv b.csv

... and then passing that argument to load_csv/load_json. But figured I'd raise the possibility here first before mucking around too much in the code.

jsvine avatar Apr 02 '21 03:04 jsvine

And I meant to say: Thanks for such an elegant and useful repo/tool! The code was a pleasure to read.

jsvine avatar Apr 02 '21 03:04 jsvine

Ah, and while tinkering to scratch my own itch, I failed to recognize that something similar was proposed in #1!

This PR takes a slightly different approach (sticking with a single --key option, rather than multiple), based on my own personal preferences. No offense taken if you opt for the other one.

jsvine avatar Apr 02 '21 03:04 jsvine