csv2notion icon indicating copy to clipboard operation
csv2notion copied to clipboard

[Bug]: Merging the same CSV multiple times results with some (random) duplicate rows

Open bumper314 opened this issue 1 year ago • 1 comments

csv2notion version

0.3.9

What OS are you using?

MacOS

OS Version / Linux distribution

macOS 10.14, Python 3.12.5

Bug description

  1. Start with an empty database
  2. Import the CSV file with csv2notion --token "$token" --url "$url" --merge "youtube2csv_spanishafterhours.csv"
  3. Run the command again (which shouldn't change anything) results in a few duplicate rows at the bottom. Run again and you'll get more duplicate rows, but not necessarily the same as the second run. It's kinda random.

I can't find any obvious reason for the duplicate rows. The Titles (key column) are mostly ASCII, but some contain strange Unicode like zero width joiner and some emoji.

Example CSV file to demonstrate the issue: youtube2csv_spanishafterhours.csv

Log excerpt

Nothing helpful in the log, even with --verbose

2024-09-17 13:23:55,333 [INFO    ] Validating CSV & Notion DB schema
2024-09-17 13:23:55,653 [INFO    ] Uploading youtube2csv_spanishafterhours.csv...
2024-09-17 13:24:00,856 [INFO    ] Done!

bumper314 avatar Sep 17 '24 19:09 bumper314

Here are the titles for the rows that get duplicated:

  • CATCHING UP WITH MY FUTURE SELF (2026) // Spanish Comprehensible Input, intermediate || SAH
  • I challenged a POLYGLOT to a LANGUAGE BATTLE kind of
  • INKTOBER: Edición Input en Español - Día 1 y último lol +HISTORIA (2020)

Apart from the last line, the first two lines are pure ASCII, so I don't think this is a Unicode issue. I also tried normalizing the .csv to all 4 Unicode normalization forms, but I still get the duplicate lines on import.

BTW, --verbose combined with --fail-on-duplicates should output the duplicates to make it easier for people to find issues.

bumper314 avatar Sep 19 '24 19:09 bumper314