csvdedupe
csvdedupe copied to clipboard
:id: Command line tool for deduplicating CSV files
The Windows characters for fancy double-quote and dash (Hex: `0x92`, `0x95`, `0x96`) cause csvdedupe to crash. This appears to be an issue in the upstream csvkit. A workaround is to...
I've used cvsdedupe to try and match up a list of ~77,000 unmapped entries to a master list of ~141,000 known things. It worked and has given a list of...
Installing on Mac OSX 10.14.2 (or any newer version of Mac that dont use Xcode 9.4 Gives the following error: ``` ERROR: Command errored out with exit status 1: command:...
I saw that #30 tried to add this, but relied too heavily on magic field names. This is a more in-depth stab at bringing nicer coordinate matching to csvdedupe. (This...
Hello all, I am trying to use csvlink across two input files as normal. However, one of the csv files I'm using is several GB in size (how the file...
I saw the error for using custom delimiter in **csvlink** "RecordLink" command line the error is due function **writeLinkedResults()** line 141, 145 of **csvhelpers.py** where while reading the file delimiter...
After training and running I get the following error when it tries to write the results: ``` INFO:root:# duplicate sets 26051 Traceback (most recent call last): File "/usr/local/bin/csvlink", line 11,...
I have a use-case for csvlink with two files containing different amounts of columns. There is no clear 1:1 mapping of colX from file1 to colY from file2. Looking at...
We already have `--inner_join`, but would be useful to also have `--left_join` and `--right_join` too for when we want to also return items in one file not matched to the...
I was setting a value for skip_training in the configuration file, however this value is always overridden by the command line settings. The command lines settings are loaded onto the...