roam-to-git icon indicating copy to clipboard operation
roam-to-git copied to clipboard

Detect potential duplicates

Open MatthieuBizien opened this issue 5 years ago • 0 comments

Roam does not seems to have an advanced de-duplication algorithm for notes.

  1. A space at the end of a note title is not voluntary most of the time
  2. Unicode is not normalized

Eg. for 1.: [[Charlène]] and [[Charlène]]. Looks identical, but if we print the bytes, they are different, b'Charle\xcc\x80ne' versus b'Charl\xc3\xa8ne'. If I use unicodedata.normalize on both, they become identical.

We could detect them and save the list of errors in a dedicated files.

MatthieuBizien avatar Apr 21 '20 10:04 MatthieuBizien