orama
orama copied to clipboard
Different search result after persist and restore database index
Describe the bug
I created a database and persisted it with @orama/plugin-data-persistence plugin. After restoring the index from JSON string the search result was diffirent, than the search before persisting it.
To Reproduce
With the following code the bug could be reproduced:
package.json
{
"name": "orama-pilot",
"private": true,
"version": "1.0.0",
"type": "module",
"dependencies": {
"@orama/orama": "^2.0.15",
"@orama/plugin-data-persistence": "^2.0.15",
"@orama/stemmers": "^2.0.15",
"@orama/stopwords": "^2.0.15"
}
}
index.js
import { create, insert, search } from '@orama/orama';
import { persist, restore } from '@orama/plugin-data-persistence';
import { stopwords as hungarianStopwords } from '@orama/stopwords/hungarian';
import {
stemmer,
language as hungarianLanguage,
} from '@orama/stemmers/hungarian';
// Database
const originalDatabaseInstance = await create({
schema: {
type: 'string',
name: 'string',
},
components: {
tokenizer: {
stopWords: hungarianStopwords,
stemming: true,
stemmerSkipProperties: ['type'],
language: hungarianLanguage,
stemmer,
},
},
});
// Insert record
await insert(originalDatabaseInstance, {
type: 'infantry',
name: 'Piski ütközet',
});
const searchOptions = { term: 'Piski' };
// Search from original database index
const searchResultFromOriginalDatabaseInstance = await search(
originalDatabaseInstance,
searchOptions
);
console.log('Count:', searchResultFromOriginalDatabaseInstance.count); // Count: 1
// Persist database index
const databaseIndex = await persist(originalDatabaseInstance, 'json');
// Restore database index
const restoredDatabaseInstance = await restore('json', databaseIndex);
// Search from restored database index
const searchResultFromRestoredDatabaseInstance = await search(
restoredDatabaseInstance,
searchOptions
);
console.log('Count:', searchResultFromRestoredDatabaseInstance.count); // Count: 0
Expected behavior
After restoring the database, I expected the same search results as before persistence.
Environment Info
OS: Windows 11 Pro
Node: v20.2.0
@orama/orama: 2.0.15
@orama/plugin-data-persistence: 2.0.15
@orama/stemmers: 2.0.15
@orama/stopwords: 2.0.15
Affected areas
Search
Additional context
No response
Hi @gdeak-monguz, I fear this is because when you persist in the database, you lose the stemmer (you can't save functions to disk). So I recommend recreating a new database with a stemmer, then use it for restoring the data
Hi @gdeak-monguz, I fear this is because when you persist in the database, you lose the stemmer (you can't save functions to disk). So I recommend recreating a new database with a stemmer, then use it for restoring the data
How can I do this? I tried to create a new database instance with the same schema and components (tokenizer -> stemmer and stopwords) and use insertMultiple function with this new instance and the persist database index, but it still does not work.
Same problem with Chinese db,how can fix it?
@gdeak-monguz, @anianj, try with something like this:
import { readFileSync } from "node:fs"
import { create, load } from "@orama/orama"
import { stemmer, language } from "@orama/stemmers/italian"
import { stopwords as italianStopwords } from '@orama/stopwords/italian'
import { restoreFromFile } from "@orama/plugin-data-persistence/server"
const myPersistedDB = JSON.parse(readFileSync("./mydb.json", "utf-8"))
const newDB = await create({
schema: { __tmp: "string" }, // this property will be overridden, don't worry about it
components: {
tokenizer: {
stemming: true,
stopwords: italianStopwords,
stemmer,
language
}
}
})
await load(newDB, myPersistedDB)
// Then you can search on newDB
I'm sorry this gets so tricky. We will fix this. But in the meantime, that should solve your issue.
Please let me know if it works!
The suggested fixed didn't work. I'me seeing the exact same issue and we are getting different results after reloading.
Any eta on the fix @micheleriva ?
No ETA yet. PR always welcome to speed up ETA
Close this due to inactivity.
This is still an issue, closing it means it won't be resolved?
Nope, but any PRs are welcome. Feel free to propose something.