epub-full-text-search
epub-full-text-search copied to clipboard
Cannot search japanese content
Hi, It work ok with English or non unicode characters but when I try to search Japanese content, it cant return result.
Can anyone help me solution for this?!
Hi @nhienhuynh, at the moment this software doesn't support the Japanese language. Sorry I have no spare time to test it, fix it, implement it. Maybe you want realise this if so I can help you to find in the code the right spots. Lars
Hi Lars, Thanks for reply quickly! I want to fix it but I cant find the code, I think problem happen when create index database (levelup). Can you help me to find the code?
Tom
Hi Tom, can you verify if the indexing process failed or the search process failed? Do you have an example book that can you share with me? Lars
Hi Lars, The search fail but I think the database file encoding problem during index. I attach the epub file for you, this is korean but it has same issue with japanese. In this file when I search keyword "King", it's ok but when I search "그가 장난기" it return empty although this content existing.
Tom
HI Lars,
Did you have time to take a look for me yet?!
regards,
Hi Tom, I have made a small refactoring of the code base. Then I test it with your korean ebook and it works fine on my side. Get this:
--------------------------------------------------------------------------
*** epubTitle: King SHADOW 1권 ***
--------------------------------------------------------------------------
*** baseCfi: /6/24[Section0011.xhtml]! ***
*** href: Text/Section0011.xhtml ***
*** cfis: 1 hits
------> /6/24[Section0011.xhtml]!/4/654,/1:6,/1:7
***
if I search "그가 장난기" .
And I have published a new version via npm. Can you please test it if it fixed your problem? Best Lars
Hi Lars,
Big thank for your help! I am checking and will report to you later
Best regards, Tom
hi Lars,
Here is my report after check:
- When search with param "t" (epub title) it return empty result When keyword (q) is english, it's ok but when keyword is korean and epub title is korean include space it can't return result -----------Error case-------------- curl -XGET "http://127.0.0.1:8085/search?q=미래&t=King%20SHADOW%201권" client request Keyword: 미ë�� bookTitle: King SHADOW 1ê¶� Result: [] ----------Good case-------------- curl -XGET "http://127.0.1:8085/search?q=King&t=King%20SHADOW%201권" client request Keyword: King bookTitle: King SHADOW 1ê¶� Result: [{"filename":"KingSHADOW11475051244","epubTitle":"King SHADOW 1권","href":"Text/Section0012.xhtml","baseCfi":"/6/26[Section0012.xhtml]!","id":"Section0012.xhtml","cfis":["/6/26[Section0012.xhtml]!/4/2,/1:6,/1:10"]}]
[{"filename":"KingSHADOW11475051244","epubTitle":"King SHADOW 1권","href":"Text/Section0012.xhtml","baseCfi":"/6/26[Section0012.xhtml]!","id":"Section0012.xhtml","cfis":["/6/26[Section0012.xhtml]!/4/2,/1:6,/1:10"]}]
- In the result, mostly people want to get the sentence that include the searched phrase but in current result how can we do that?
Best regards, Tom
Hi @larsvoigt , I am trying to search for some Bangla (Bengali language ) content in an epub. I think there is no problem in indexing the epub but when I make the query for a Bangla word, search returns empty. As the similar problem was solved for Japanese language, I thought the same solution would also work for Bangla. Can you please help me ? regards. Rudra