realm-java icon indicating copy to clipboard operation
realm-java copied to clipboard

Diacritic-insensitive search

Open RobinCaroff opened this issue 7 years ago • 15 comments

Hi,

I saw that Realm cocoa v2.5.0 added support for diacritic-insensitive search.

Are you planning to make this available in a future release of Realm Android?

I know that a normalised column could be added to implement this feature (see this Realm Android issue and stackoverflow issue) but would this be the most efficient way of doing it?

Thanks

RobinCaroff avatar Apr 20 '17 19:04 RobinCaroff

We'll talk this over and see what kind of priority that we can give it. Thank you for showing your interest in having it added to the Java binding.

karagraysen avatar Apr 20 '17 21:04 karagraysen

Yes, this is something we want to add, but we don't have a timeline yet.

The API is probably going to be slightly challenging.

Right now we have

realm.where(Person.class).equalTo(String field, String value, Case casing);

That leaves use with a few options

  1. Add Diacritic enum
realm.where(Person.class).equalTo("name", "John");
realm.where(Person.class).equalTo("name", "John", Case.INSENSITIVE, Diacritic.SENSITIVE);

// Should we also add only a diacritic option ? The combinatorial explosion makes me think no.
realm.where(Person.class).equalTo("name", "John", Diacritic.SENSITIVE);
  • (+) NSPredicate only seem to support Case/Diacritric as well.
  • (+) Works without breaking existing API.
  • (-) Doesn't scale very well if we add more options. We need to consider if full-text search might do that.
  • (-) Diacritic searches must also specify the Case setting
  1. Replace enum with bit flags:
public class QueryOption {
  public final static int CASE_SENSITIVE = 0x01;
  public final static int CASE_INSENITIVE = 0x02;
  public final static int DIACRITIC_SENSITIVE = 0x04;
  public final static int DIACRITIC_INSENSITIVE = 0x08;
}

realm.where(Person.class).equalTo("name", "John");
realm.where(Person.class).equalTo("name", "John", QueryOption.CASE_INSENTIVE | QueryOption.DIACTRIC_INSENTIVE);
realm.where(Person.class).equalTo("name", "John", QueryOption.DIACRITIC_INSENITIVE);
  • (+) Much more flexible, also in terms of adding new features
  • (-) We replaced the original case boolean with an enum because of readability and auto-complete issues. Using bit flags will put us back into that position.
  • (-) Breaking change, so must either wait for 4.0 or we need to duplicate a lot of methods.
  1. Other options

Not sure what those could be?

I probably lean towards 1) with the acceptance that for diacritic insensitive searches you also need to supply the case parameter. Thoughts @realm/java ?

cmelchior avatar Apr 21 '17 07:04 cmelchior

What about this:


public class SortParam {
    Case caseParam;
    public SortParam(Case caseParam) {
       this.caseParam = caseParam;
    }
}

public enum Case {
    CASE_SENSITIVE(true),
    CASE_INSENSITIVE(false);

    private final boolean value;
    public static final SortParam SENSITIVE = new SortParam(CASE_SENSITIVE);
    public static final SortParam INSENSITIVE = new SortParam(CASE_INSENSITIVE);
}

RealmQuery equalTo(String, String, SortParam);

// It will still compile with old code although the API signature changed
realm.where(Person.class).equalTo("name", "John", Case.INSENSITIVE);

// It also has some flexibility to support more sort params:
realm.where(Person.class).equalTo("name", "John", new SortParam(Case.CASE_INSENSITIVE, Diacritic.DIACRITIC_SENSITIVE));

beeender avatar Apr 21 '17 08:04 beeender

I guess it could work, and even though it is API breaking, most will probably not have to do any changes. I would be a bit concerned about how complicated it becomes to combine options (even more than both options I outlined).

cmelchior avatar Apr 21 '17 09:04 cmelchior

@cmelchior how about:

    public class StringFilter {
         final Case case;
         final Diacritic diacritic;

         public static class Builder {
              private Case case = Case.SENSITIVE;
              private Diacritic diacritic = Diacritic.INSENSITIVE;

              public Builder withCase(Case case) {
                   checkNotNull(case);
                   this.case = case;
                   return this;
              }

              public Builder withDiacritic(Diacritic diacritic) {
                   checkNotNull(diacritic);
                   this.diacritic = diacritic;
                   return this;
              }

              public StringFilter build() {
                   return new StringFilter(case, diacritic);
              }
         }

         private StringFilter(Case case, Diacritic diacritic) {
              this.case = case;
              this.diacritic = diacritic;
         }
    }

Although still a breaking change; adding Diacritic to Case wouldn't really make sense

Zhuinden avatar Apr 21 '17 10:04 Zhuinden

@Zhuinden Have you tried using that?

realm.where(Person.class).equalTo("name", "John", new StringFilter.Builder().withCase(Case.SENSITIVE).withDiacritic(Diacritic.INSENSITIVE).build();

Looks extremely long and complicated to me :)

cmelchior avatar Apr 21 '17 11:04 cmelchior

.....ah, then I think you should probably just add Diacritic enum for now

(i didn't think of the outrageous length XD)


actually, that makes the bit-wise operators much easier to reason about.

Zhuinden avatar Apr 21 '17 11:04 Zhuinden

Blocking this pr: https://github.com/realm/realm-core/issues/1082

cmelchior avatar Apr 21 '17 22:04 cmelchior

Hey guys! Any update on this? Do you have a timeline for this feature ? Thanks!

RobinCaroff avatar Jul 20 '17 14:07 RobinCaroff

@RobinCaroff Sorry. There is no timeframe yet. Unicode libraries on our supported platform are slightly different, so it needs some discussions and time to support it by core. https://github.com/realm/realm-core/issues/1082

dalinaum avatar Jul 21 '17 14:07 dalinaum

Hello, after almost one year, any update on this?

lucaspommateau avatar Jul 19 '18 14:07 lucaspommateau

Almost a year and a half now. Has this been considered for a future roadmap?

mick1418 avatar Dec 04 '18 17:12 mick1418

Eh. I personally do the same approach as with SQLite: create a new field that stores the normalized (accent-less) value, and query against that.

(i am not a realm member)

Zhuinden avatar Dec 04 '18 18:12 Zhuinden

Any news about this one?

goa avatar Apr 03 '21 20:04 goa

Not to forget... Any news on this 5-years old request?

paolo-ermit avatar Apr 06 '22 22:04 paolo-ermit