refinerycms-search icon indicating copy to clipboard operation
refinerycms-search copied to clipboard

Try convert query params to_ascii

Open bricesanchez opened this issue 9 years ago • 22 comments

In order to fix search with special characters, we have to convert a seach query to ascii : https://github.com/refinery/refinerycms-search/issues/62 https://github.com/dougal/acts_as_indexed#unicode-utf8-support

This PR needs to be merged after, this PR : https://github.com/refinery/refinerycms-acts-as-indexed/pull/8

bricesanchez avatar Aug 19 '15 15:08 bricesanchez

Actually, the fix, proposed here https://gist.github.com/dougal/193903bb4e0d6e5debe1 , does not work for me: the search result is always empty

sintro avatar Jul 21 '16 12:07 sintro

Did you reindex after applying this fix?

bricesanchez avatar Jul 21 '16 12:07 bricesanchez

Oh, forget to do it. Will try it


Update. Didn`t work.

sintro avatar Jul 21 '16 13:07 sintro

Could you copy/paste your Gemfile.lock and a step by step to reproduce?

bricesanchez avatar Jul 21 '16 17:07 bricesanchez

I will try to do clear project, using database with correct (russian) locales, and see what will happen.

sintro avatar Jul 22 '16 18:07 sintro

Just made test project, here is it. https://github.com/sintro/refinery-sandbox

Looks like everything works fine, and the problem was not in the method described here, but in re-indexing process. When I created some records and then changed acts_as_indexed from [:title, :body] to [:title_ascii, :body_ascii], and then performed reindexing using the method from 'refinerycms-search' readme (Refinery::Page.all(&:save), sure i did it for Refinery::Events::Event). After that, searching still does not work for russian Events (i am testing in admin menu by searching of the items, for this i modified the index action of my events extension). But after I deleted all the events, and then created it from scratch (when [:title_ascii, :body_ascii] was already set) the searching began work fine. So I can assume, that acts_as_indexed simply does not re-index records which was previously indexed, when you use Refinery::Page.all(&:save) way. So the only question that is still exist: how to actually re-index previously indexed records?

sintro avatar Jul 24 '16 22:07 sintro

What happened to this? I have this exact issue searching with non-ascii characters (my site is primarily in Norwegian, and searching works fine when I don't use Æ, Ø or Å (the three letters that separates the language from English)), but for some reason it's only an issue on my standard page model, not on other models I've included in the search.

My Events model (created with an engine and living in vendor/extensions) has a line that says acts_as_indexed fields: %i[title description lead] (in addition to the initializer), and I don't know if that's what makes it work with unicode or if it's something else, but searching for, say "Grøgaard" has always worked. Example: https://khio.no/en/search?utf8=%E2%9C%93&query=gr%C3%B8gaard (which should have something like 5 page results among pages in addition to the 6 events).

However, I tried this change on my fork now, and it reversed the issue. Now my pages work, but my events don't. I assume it could be an issue with reindexing, but I've cleared cache and redeployed to my test server and even reimported my production db dump. You obviously don't have my source code for the customised events model, so it's tough to nail this down, but do you have any ideas why this is? (Also, we were on Refinery 2.1.5 until recently, and it actually worked there. Not sure how much we had to customise/change/override (if anything) to make it work.)

Also, what's the plan for this PR? The commit is four and a half years old, and the status is confusing to me.

evenreven avatar Oct 06 '19 12:10 evenreven

Hi @evenreven! I did not dig this problem for a long time. Now i will probably use a more maintained gem to handle the search like elasticsearch, pg_search or ransack.

bricesanchez avatar Oct 07 '19 02:10 bricesanchez

So is that a wontfix? Or does it mean that you want to rewrite this using a backend?

The problem with this (and I've had the issue with other Refinery extensions too) is that it looks maintained because new releases keep appearing, but they're not really tested all that well. I don't mean to be crass considering this is open source, but I honestly don't think a search extension that has no unicode support should have been released/updated at all. (I've had a similar issue with the Wymeditor extension where I ended up fixing it myself (basically no locale except English was working properly)).

To better explain my frustration: If I knew search wasn't working, I'm not sure I would have upgraded from 2.1.5 - it's that important to us.

evenreven avatar Oct 07 '19 06:10 evenreven

So is that a wontfix? Or does it mean that you want to rewrite this using a backend?

This extension should be rewritten using a different backend like elasticsearch, pg_search or ransack.

The problem with this (and I've had the issue with other Refinery extensions too) is that it looks maintained because new releases keep appearing, but they're not really tested all that well. I don't mean to be crass considering this is open source, but I honestly don't think a search extension that has no unicode support should have been released/updated at all. (I've had a similar issue with the Wymeditor extension where I ended up fixing it myself (basically no locale except English was working properly)).

To better explain my frustration: If I knew search wasn't working, I'm not sure I would have upgraded from 2.1.5 - it's that important to us.

We lack contributors to improve features and test coverage and we are maintaining this project on our free time, so it's hard to improve it.

Could you help us to improve it?

bricesanchez avatar Oct 07 '19 12:10 bricesanchez

I try to help. Both with reporting issues and with upstream patches for wymeditor and for Refinery Pages. I'd like to contribute more if time and my relative lacking skill set permit, but this is what I've done so far.

Really my only issue (I also mentioned this to Philip in one of my PRs) is that I wish you were more upfront about known issues instead of forcing the users to deep dive into old PRs and issue threads to find out that quite basic features (e.g. locales in wymeditor) don't work. I'm happy and thankful for your work, and believe me I know what a thankless job open source can be. Just saying: Being upfront about lack of features/bugs goes a long way! (In my humble opinion, this issue is important enough that it should be mentioned in the README. I've spent a lot of time rewriting my old search result decorator to work with Refinery 4 and tidying up my result view, and if this had come with a warning, I would have tried to fix it first or looked elsewhere for search capabilities.)

About this specific issue: I can help with testing, I can probably help with simple patches, but I'm just not Ruby-savvy enough to help in any meaningful way with rewriting a search gem to use a different backend, sorry. Coding a robust extension and integration plugin (which I assume you would need as well, like the current refinerycms-acts-as-indexed gem) is just beyond what I can do.

evenreven avatar Oct 07 '19 13:10 evenreven

There is also this gem https://github.com/refinerycms-contrib/refinerycms-elasticsearch i've created 4 years ago which could help you to use elasticsearch on Refinery for public search

bricesanchez avatar Oct 08 '19 02:10 bricesanchez

Yeah, I've seen (and considered) that, but I steered clear because it's not updated for Refinery 4. But I could try to fork it. Was it working well back then? Do you see any potential compatibility or config issues?

evenreven avatar Oct 08 '19 05:10 evenreven

@bricesanchez ^ ?

parndt avatar Nov 06 '19 22:11 parndt

@parndt Thanks for the reminder :D

@evenreven i think refinerycms-elasticsearch could be easily update to Refinery 4. And it will need some refactor to use Chewy as elasticsearch client. it will be easier to maintain and improve

bricesanchez avatar Nov 07 '19 14:11 bricesanchez

@bricesanchez That sounds like a great idea, but again, replacing the search backend (with or without Chewy) is just beyond my capabilities. Unless you have plans to look into this - in which case I can volunteer as a tester - it won't happen.

evenreven avatar Nov 11 '19 12:11 evenreven

I will add this feature update to my backlog of Refinery tasks but i will not prioritize it for now.

bricesanchez avatar Dec 19 '19 03:12 bricesanchez

I understand it if there are more pressing issues. BTW, have you considered using pg_search as backend? It's not quite as good as Elastic, but it's easier to implement, no? I noticed Alchemy CMS is doing this.

evenreven avatar Dec 19 '19 14:12 evenreven

We got a new page with prominent content containing non-ascii characters, and this situation got untenable for us. So I added the code from this PR to the decorator, and it fixed our pages but broke our events extension. Changing the acts_as_indexed line to ascii_title/ascii_description in the event model and adding methods like this one fixed events:

def ascii_title 
  title.try(&:to_ascii) 
end

It's a bit of a mystery to me why you don't merge this when it fixes something that is badly broken - for every user - without it. The indexing is already ascii, so searching for utf-8 means broken results. Even though the problem is less severe in English than in Norwegian or French (not the mention Slavic languages with all their diacritics), it's still a pretty big bug for a search extension. Especially when it seemingly was solved four years ago.

evenreven avatar Aug 29 '20 17:08 evenreven

I'm fine with it. I personally don't like merging things that don't have passing CI. I'll leave the decision to @bricesanchez

parndt avatar Aug 31 '20 00:08 parndt

Completely agree with that, so don't expect you to merge it right now. Just passing along that in my recent experience, this fixes a pretty glaring bug in this extension with no obvious downsides. In fact, without this, the work you did on refinery-acts_as_indexed some years ago doesn't actually work and arguably made things worse (search mostly worked for us on Refinery 2.1.5 and didn't completely break until we upgraded to Refinery 4). This is the second half of one fix, really, not a separate one. But it's waited more than five years, it can wait a little longer (also, I've patched it myself in my own app). :)

evenreven avatar Sep 07 '20 09:09 evenreven

Bump. As stated previously, this gem currently doesn't work at all with non-ascii characters (i.e. it doesn't have unicode support). This PR fixes this bug. I monkey-patched my app with this fix almost two years ago, and I've had no issues. See here if you want to see it in action. The only gotcha is that you need to convert to ascii on the backend for any extensions/custom models you may have (see the def ascii_title snippet above), but that's to be expected. (Converting/normalising both search query and backend search index to ascii is a good practice for any unicode-compliant search, I'm doing the same on a client-side javascript filtering component; it works well.)

I know you want to rewrite this to use a better backend, but my advice is to just merge this PR since it fixes a huge bug. You can always rewrite it later if you feel like it.

evenreven avatar Apr 20 '22 19:04 evenreven