epub-full-text-search icon indicating copy to clipboard operation
epub-full-text-search copied to clipboard

Cannot search japanese content

Open nhienhuynh opened this issue 8 years ago • 9 comments

Hi, It work ok with English or non unicode characters but when I try to search Japanese content, it cant return result.

Can anyone help me solution for this?!

nhienhuynh avatar Sep 28 '16 07:09 nhienhuynh

Hi @nhienhuynh, at the moment this software doesn't support the Japanese language. Sorry I have no spare time to test it, fix it, implement it. Maybe you want realise this if so I can help you to find in the code the right spots. Lars

larsvoigt avatar Sep 28 '16 11:09 larsvoigt

Hi Lars, Thanks for reply quickly! I want to fix it but I cant find the code, I think problem happen when create index database (levelup). Can you help me to find the code?

Tom

nhienhuynh avatar Sep 28 '16 12:09 nhienhuynh

Hi Tom, can you verify if the indexing process failed or the search process failed? Do you have an example book that can you share with me? Lars

larsvoigt avatar Sep 28 '16 19:09 larsvoigt

Hi Lars, The search fail but I think the database file encoding problem during index. I attach the epub file for you, this is korean but it has same issue with japanese. In this file when I search keyword "King", it's ok but when I search "그가 장난기" it return empty although this content existing.

Tom

KingSHADOW1.zip

nhienhuynh avatar Sep 29 '16 01:09 nhienhuynh

HI Lars,

Did you have time to take a look for me yet?!

regards,

nhienhuynh avatar Oct 03 '16 16:10 nhienhuynh

Hi Tom, I have made a small refactoring of the code base. Then I test it with your korean ebook and it works fine on my side. Get this:

--------------------------------------------------------------------------
*** epubTitle: King SHADOW 1권 ***
--------------------------------------------------------------------------
*** baseCfi: /6/24[Section0011.xhtml]! ***
*** href: Text/Section0011.xhtml ***
*** cfis: 1 hits
------> /6/24[Section0011.xhtml]!/4/654,/1:6,/1:7
***

if I search "그가 장난기" .

And I have published a new version via npm. Can you please test it if it fixed your problem? Best Lars

larsvoigt avatar Oct 04 '16 13:10 larsvoigt

Hi Lars,

Big thank for your help! I am checking and will report to you later

Best regards, Tom

nhienhuynh avatar Oct 05 '16 10:10 nhienhuynh

hi Lars,

Here is my report after check:

  1. When search with param "t" (epub title) it return empty result When keyword (q) is english, it's ok but when keyword is korean and epub title is korean include space it can't return result -----------Error case-------------- curl -XGET "http://127.0.0.1:8085/search?q=미래&t=King%20SHADOW%201권" client request Keyword: 미ë�� bookTitle: King SHADOW 1ê¶� Result: [] ----------Good case-------------- curl -XGET "http://127.0.1:8085/search?q=King&t=King%20SHADOW%201권" client request Keyword: King bookTitle: King SHADOW 1ê¶� Result: [{"filename":"KingSHADOW11475051244","epubTitle":"King SHADOW 1권","href":"Text/Section0012.xhtml","baseCfi":"/6/26[Section0012.xhtml]!","id":"Section0012.xhtml","cfis":["/6/26[Section0012.xhtml]!/4/2,/1:6,/1:10"]}]

[{"filename":"KingSHADOW11475051244","epubTitle":"King SHADOW 1권","href":"Text/Section0012.xhtml","baseCfi":"/6/26[Section0012.xhtml]!","id":"Section0012.xhtml","cfis":["/6/26[Section0012.xhtml]!/4/2,/1:6,/1:10"]}]

  1. In the result, mostly people want to get the sentence that include the searched phrase but in current result how can we do that?

Best regards, Tom

nhienhuynh avatar Oct 08 '16 05:10 nhienhuynh

Hi @larsvoigt , I am trying to search for some Bangla (Bengali language ) content in an epub. I think there is no problem in indexing the epub but when I make the query for a Bangla word, search returns empty. As the similar problem was solved for Japanese language, I thought the same solution would also work for Bangla. Can you please help me ? regards. Rudra

rudra0713 avatar Aug 22 '17 11:08 rudra0713