orama icon indicating copy to clipboard operation
orama copied to clipboard

Different search result after persist and restore database index

Open gdeak-monguz opened this issue 10 months ago • 6 comments

Describe the bug

I created a database and persisted it with @orama/plugin-data-persistence plugin. After restoring the index from JSON string the search result was diffirent, than the search before persisting it.

To Reproduce

With the following code the bug could be reproduced:

package.json

{
  "name": "orama-pilot",
  "private": true,
  "version": "1.0.0",
  "type": "module",
  "dependencies": {
    "@orama/orama": "^2.0.15",
    "@orama/plugin-data-persistence": "^2.0.15",
    "@orama/stemmers": "^2.0.15",
    "@orama/stopwords": "^2.0.15"
  }
}

index.js

import { create, insert, search } from '@orama/orama';
import { persist, restore } from '@orama/plugin-data-persistence';
import { stopwords as hungarianStopwords } from '@orama/stopwords/hungarian';
import {
  stemmer,
  language as hungarianLanguage,
} from '@orama/stemmers/hungarian';

// Database
const originalDatabaseInstance = await create({
  schema: {
    type: 'string',
    name: 'string',
  },
  components: {
    tokenizer: {
      stopWords: hungarianStopwords,
      stemming: true,
      stemmerSkipProperties: ['type'],
      language: hungarianLanguage,
      stemmer,
    },
  },
});

// Insert record
await insert(originalDatabaseInstance, {
  type: 'infantry',
  name: 'Piski ütközet',
});

const searchOptions = { term: 'Piski' };

// Search from original database index
const searchResultFromOriginalDatabaseInstance = await search(
  originalDatabaseInstance,
  searchOptions
);
console.log('Count:', searchResultFromOriginalDatabaseInstance.count);  // Count: 1

// Persist database index
const databaseIndex = await persist(originalDatabaseInstance, 'json');
// Restore database index
const restoredDatabaseInstance = await restore('json', databaseIndex);

// Search from restored database index
const searchResultFromRestoredDatabaseInstance = await search(
  restoredDatabaseInstance,
  searchOptions
);
console.log('Count:', searchResultFromRestoredDatabaseInstance.count); // Count: 0

Expected behavior

After restoring the database, I expected the same search results as before persistence.

Environment Info

OS: Windows 11 Pro
Node: v20.2.0
@orama/orama: 2.0.15
@orama/plugin-data-persistence: 2.0.15
@orama/stemmers: 2.0.15
@orama/stopwords: 2.0.15

Affected areas

Search

Additional context

No response

gdeak-monguz avatar Apr 13 '24 20:04 gdeak-monguz

Hi @gdeak-monguz, I fear this is because when you persist in the database, you lose the stemmer (you can't save functions to disk). So I recommend recreating a new database with a stemmer, then use it for restoring the data

micheleriva avatar Apr 14 '24 17:04 micheleriva

Hi @gdeak-monguz, I fear this is because when you persist in the database, you lose the stemmer (you can't save functions to disk). So I recommend recreating a new database with a stemmer, then use it for restoring the data

How can I do this? I tried to create a new database instance with the same schema and components (tokenizer -> stemmer and stopwords) and use insertMultiple function with this new instance and the persist database index, but it still does not work.

gdeak-monguz avatar Apr 15 '24 08:04 gdeak-monguz

Same problem with Chinese db,how can fix it?

anianj avatar Jul 11 '24 07:07 anianj

@gdeak-monguz, @anianj, try with something like this:

import { readFileSync } from "node:fs"
import { create, load } from "@orama/orama"
import { stemmer, language } from "@orama/stemmers/italian"
import { stopwords as italianStopwords } from '@orama/stopwords/italian'
import { restoreFromFile } from "@orama/plugin-data-persistence/server"

const myPersistedDB = JSON.parse(readFileSync("./mydb.json", "utf-8"))

const newDB = await create({
  schema: { __tmp: "string" }, // this property will be overridden, don't worry about it
  components: {
    tokenizer: {
      stemming: true,
      stopwords: italianStopwords,
      stemmer,
      language
    }
  }
})

await load(newDB, myPersistedDB)

// Then you can search on newDB

I'm sorry this gets so tricky. We will fix this. But in the meantime, that should solve your issue.

Please let me know if it works!

micheleriva avatar Jul 11 '24 08:07 micheleriva

The suggested fixed didn't work. I'me seeing the exact same issue and we are getting different results after reloading.

Any eta on the fix @micheleriva ?

cmartinho avatar Jul 30 '24 14:07 cmartinho

No ETA yet. PR always welcome to speed up ETA

micheleriva avatar Jul 30 '24 15:07 micheleriva

Close this due to inactivity.

allevo avatar Oct 16 '24 15:10 allevo

This is still an issue, closing it means it won't be resolved?

cmartinho avatar Oct 16 '24 15:10 cmartinho

Nope, but any PRs are welcome. Feel free to propose something.

allevo avatar Oct 16 '24 15:10 allevo