realm-core icon indicating copy to clipboard operation
realm-core copied to clipboard

RealmDB Full Text Search not finding records in big DB

Open tn7111 opened this issue 1 year ago • 10 comments

Expected results

There's FullText index on content field of my DB. There's a record there: "18x48 18x48x80 white 18 x 48 80 prm 1848 15" It should be able found by request content TEXT "prm 18"

Actual Results

request yields no results even though I have more than 1 record that satisfies request

Steps & Code to Reproduce

I tried through both .NET SDK through Unity & Realm Studio. Same results. Even if I set record's content field to 'prm 18' directly search does not return it. If I try searching content TEXT "prm" everything works as expected

Core version

Core version: Unity 11.5.0 (core 13.20.1 I think (the CHANGELOG says x.y.z here, sorry)). But I've tried with the latest Realm Studio, so I guess it uses some later core version.

tn7111 avatar Feb 22 '24 14:02 tn7111

➤ PM Bot commented:

Jira ticket: RCORE-1989

sync-by-unito[bot] avatar Feb 22 '24 14:02 sync-by-unito[bot]

The FTS implementation matches on full words, so I guess it treats 1848 as a word and doesn't match 18 there. I haven't tried it, but you could try adding * to make it a prefix search.

nirinchev avatar Feb 22 '24 15:02 nirinchev

take a look. there's also just 18 earlier in the string.

tn7111 avatar Feb 22 '24 15:02 tn7111

Another thing worth noting. I tried recreating the DB with limited record number (around 300). Search worked correctly. And when I try my original db file which has 60k entries, search fails as described.

tn7111 avatar Feb 22 '24 15:02 tn7111

Hm, good point. I guess @jedelbo that's in your area of expertise. @tn7111 not sure if that'd be possible, but if you could give us access to the database where search fails, that would make it a lot easier to find the root cause.

nirinchev avatar Feb 22 '24 15:02 nirinchev

mm, I'll think of a way to obfuscate data maybe... not sure though. I wonder, @nirinchev @jedelbo is there a way to retrive actual index somehow?

tn7111 avatar Feb 22 '24 16:02 tn7111

@tn7111 There must be some weird combination of words that somehow tricks the index. If there is any chance that you can produce a .realm file that exhibits the problem with data you can share, you can send it privately to me at [email protected].

jedelbo avatar Feb 23 '24 13:02 jedelbo

If you can build realm core locally, this c++ program (modified appropriately) can dump the index.

#include "realm.hpp"
#include <iostream>

using namespace realm;

int main()
{
    DBRef db = DB::create(make_in_realm_history(), "test.realm");
    auto wt = db->start_read();
    auto table = wt->get_table("table");
    auto col = table->get_column_key("text");
    table->get_search_index(col)->do_dump_node_structure(std::cout, 0);
}

jedelbo avatar Feb 23 '24 14:02 jedelbo

You need to build in DEBUG mode.

jedelbo avatar Feb 23 '24 14:02 jedelbo

Wow! That's a lot of help. Working now on recreating .realm file. It seems like the size of the DB does not matter. I guess it's just something about the index. I have now 2 similar .realm files (which I cannot share for now, since this is production data). The number of entries is about the same. The new one works correctly. The old one fails as described. The entries and their respective fields I query are the same.

tn7111 avatar Feb 23 '24 14:02 tn7111

This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.

github-actions[bot] avatar Apr 03 '24 00:04 github-actions[bot]