crengine icon indicating copy to clipboard operation
crengine copied to clipboard

Perfect hashing?

Open NiLuJe opened this issue 4 years ago • 1 comments

Just a random drive-by so that I don't forget about it ;).

Inspired by https://github.com/ArtifexSoftware/mupdf/commit/6bbfe286fd602aeedd674473a08221251d9c09b6 in MµPDF, I'm wondering if we couldn't also make use of something similar in a few places to speed things up.

I vaguely remember you switched some similar stuff to a binary search recently, @poire-z ?

NiLuJe avatar Jun 29 '20 18:06 NiLuJe

I have no real idea how/what gperf really does :/ But looking at the mupdf patch:

  • they replaced some late string comparisons with some id/hash comparisons - we already do that, by using some sequential id (when parsing the stylesheet text into an intermediate binary form) from this enum: https://github.com/koreader/crengine/blob/0dee202b541a66f8b34adb3d0e2265d8c9521f45/crengine/src/lvstsheet.cpp#L30-L36

  • they use that gperf stuff when parsing - we indeed do many case insensitive string comparisons: https://github.com/koreader/crengine/blob/0dee202b541a66f8b34adb3d0e2265d8c9521f45/crengine/src/lvstsheet.cpp#L123-L129 https://github.com/koreader/crengine/blob/0dee202b541a66f8b34adb3d0e2265d8c9521f45/crengine/src/lvstsheet.cpp#L297-L312

I guess we could benefit from gperf for that 2nd part, but the benefit will not be much (vs the added gperf opacity/complexity when debugging): in my experience debugging and timing things, the time spent parsing a 30K or 500K OPS/publisher.css is really peanuts compared to the time spent checking/applying it to all the nodes of a book (even if parsing a 500K css takes 2 seconds, checking/applying the thousand rules can take minutes). (I thought I had mentionned that at https://github.com/koreader/crengine/pull/276#issuecomment-517510750 and https://github.com/koreader/crengine/pull/276#issuecomment-517920207 , but didn't explicitely). One case where it might help is when you have in an EPUB 4000 small html each linking the same huge CSS: we'll be parsing 4000 times that huge CSS, and it might be more expansive than applying them to the few dozens nodes in each HTML.

So, I don't think the benefit will really be noticable. (But feel free to go at it if you wish, and it doesn't make the code too much unreadable.) (Also, dunno gperf seems to have an option for ignoring case sensitivity, which we should for CSS properties, but MuPDF did not specify it it seems.)

poire-z avatar Jun 29 '20 18:06 poire-z