flexsearch
flexsearch copied to clipboard
Weird behavior when exporting data
I have a collection on mongo that contains about 300k documents of translations, each document has the following structure:
{
_id: ObjectId;
i18n: {
en: string;
es: string;
pt: string;
}
}
Since I need to make queries against this data an mongo doesn't support partial text search I decided to use flexsearch
to index this documents, and since loading all of them every time the server restarts is a heavy operation, I'm trying to export the indexes, so when the server restarts I'll import theses indexes instead. The problem is that I can't get the export
to work properly. This is what I tried:
const flex = new Document({
preset: 'memory',
cache: 1000,
optimize: true,
worker: false,
tokenize: 'forward',
document: {
id: '_id',
store: false,
index: [
{
field: 'i18n:en',
tokenize: 'forward',
language: 'en'
},
{
field: 'i18n:es',
tokenize: 'forward',
language: 'es'
},
{
field: 'i18n:pt',
tokenize: 'forward',
language: 'pt'
}
]
}
});
const docs = await fs.readFile('my_collection.json', { encoding: 'utf-8' }).then(data => JSON.parse(data));
// Use int as index, as recommended by the documentation
for (let i = 0; i < docs.length; i++) {
const doc = docs[i];
await flex.addAsync(i, doc);
}
flex.export((id, doc) => {
try {
spin.info(`Exporting ${id}`);
fs.writeFileSync(`${id}.json`, doc ?? '');
} catch (e) {
console.error(e);
console.error(`Error exporting ${id}`);
throw e;
}
});
This export will create me two files reg.json
and _id.cfg.json
, which seems weird, when I limit the number of documents, to use 10k docs instead of 300k I get a lot more files which make more sense.
How could I fix this? Should I make multiple exports with small chunks of indexed documents? If so, when importing these chunks will it be ok, won't my data get overridden if I import the same document twice with different indexes?
Technical info:
- Node 16
- OS: Windows / Ubuntu