csvdedupe icon indicating copy to clipboard operation
csvdedupe copied to clipboard

:id: Command line tool for deduplicating CSV files

Results 25 csvdedupe issues
Sort by recently updated
recently updated
newest added

In the results shown above, the algorithm does a great job of assigning Cluster ID = 0 for a contact with various title changes, but for some reason it assigns...

When trying to use an existing training.json file on a dataset instead of getting output I have errors thrown: ```sh csvdedupe --config_file=processors/csvdedupe-config.json --training_file=training.json --settings_file=processors/learned_settings data/finished/arts-and-cultural-assets-massachusetts-clustered.csv > test2.csv INFO:root:imported 2673 rows...

Hi, Attempting to link two CSV files and the latlong comparator is failing because the fields are being treated as strings. Error: `INFO:root:taking a sample of 150000 possible pairs Traceback...

The documentation claims: > csvlink operates in much the same way as csvdedupe, but will flatten both CSVs in to one output file similar to a SQL OUTER JOIN statement....

Sorry for my newbee question; I would like to understand how to launch the csvlink.py When I launch it I got this error: (threesixenv) C:\Users\Utilisateur\Documents\_Inmedia\_csvdedupe-master\csvdedupe-master\csvdedupe>python csvlink.py usage: csvlink.py [-h] [--config_file...

Hi, i just installed csvdedupe, i try to execute the csvdedupe cmd line, but it throw error. ImportError: cannot import name ABC `csvdedupe DEDUPE2-TEST.csv --field_names Name --output_file output.csv` `Traceback (most...

Quick and dirty fix. Problem was that there was default value of delimiter inserted into `args`. Values from `args` are used to override parameters from config file. This resulted in...

Is there a way to reuse training data when the field names change, even it it requires mucking with the training data file? Example I run a training set on...

I have a use case where I need to be able to restart the process where we left off, basically calling csvdedupe as a service to create training sets. I...

Hello, I followed your csvlink example and ran a command in the following format: csvlink dir/file_1.csv dir/file_2.csv \ --field_names_1 different name \ --field_names_2 other fields \ --output dedupe_output.csv However, upon...