kepubify icon indicating copy to clipboard operation
kepubify copied to clipboard

cmd/seriesmeta: Add Unicode normalization for book IDs

Open arturdryomov opened this issue 5 months ago • 2 comments

The _seriesmeta trigger searches the content table for a book with the same ID as the inserted one.

SELECT COUNT() FROM content WHERE ImageId = new.ImageId

It works fine with ASCII IDs (which are kinda file paths BTW). However, when calling seriesmeta on macOS with Unicode file names being involved, it doesn’t work. This happens because IDs at content are NFC-normalized when macOS returns file paths as NFD-normalized strings. As an example (using Python):

>>> value_content = "file____mnt_onboard_Books_Сапковский__Час_презрения_kepub_epub"
>>> value_series = "file____mnt_onboard_Books_Сапковский__Час_презрения_kepub_epub"

>>> value_content == value_series
False
>>> unicodedata.is_normalized("NFC", value_content)
True
>>> unicodedata.is_normalized("NFC", value_series)
False
>>> unicodedata.is_normalized("NFD", value_content)
False
>>> unicodedata.is_normalized("NFD", value_series)
True

In other words, visually values might look the same but without the normalization the SQLite comparison will not work.

arturdryomov avatar Jul 23 '25 17:07 arturdryomov

@pgaskin, PTAL.

arturdryomov avatar Jul 23 '25 17:07 arturdryomov

Seems good. I'll merge it later.

Note that NickelSeries is currently the recommended solution for series metadata. I'm only keeping seriesmeta around for existing users and because I haven't got around to doing another release deprecating it yet.

pgaskin avatar Jul 25 '25 05:07 pgaskin