flexsearch icon indicating copy to clipboard operation
flexsearch copied to clipboard

Importing exported indexes doesn't populate store (#249)

Open zanzlender opened this issue 2 years ago • 5 comments

I've been having some trouble implementing exporting and then importing those same indexes...

I found this issue #249 . Although it says it should have been fixed, I still have the same problem.

I followed this linked Stackoverflow issue and exported my indexes like so;:

// Write to FlexSearch index file
const flexIndex = new Document({
  tokenize: "forward",
  document: {
    id: "id",
    index: ["id", "url", "transcript", "timestamp"],
    store: true,
  },
  context: {
    resolution: 5,
    depth: 3,
  },
  cache: true,
});


/*
type Transcript = {
  id: string;
  url: string;
  transcript: Array<{
    timestamp: string;
    transcript: string;
  }>;
};
*/ 
transcriptsJson.forEach((_video) => {
  _video.transcript.forEach((_transcript, _index) => {
    flexIndex.add({
      id: `${_video.id}-${_index}`,
      url: _video.url,
      transcript: _transcript.transcript,
      timestamp: _transcript.timestamp,
    });
  });
});
    
const searchIndexPath2 = path.join(cwd(), "/src/content/flex-search/");

const res = await flexIndex.export(function (key, data) {
  fs.writeFileSync(
    `${searchIndexPath2}${key}.json`,
    data !== undefined ? (data as string) : ""
  );
});

And later I try to import them like so:

const keys = fs
  .readdirSync(searchIndexPath, { withFileTypes: true })
  .filter((item) => !item.isDirectory() && item.name.includes(".json"))
  .map((item) => item.name.slice(0, -5));

for (let i = 0, key; i < keys.length; i += 1) {
  key = keys[i];
  const data = fs.readFileSync(
    `${searchIndexPath}${key ?? ""}.json`,
    "utf8"
  );

  await flexIndex.import(key as string, data ?? null);
}

And finally I can search like so

const res = flexIndex.search("03", {
  index: ["transcript", "timestamp"],
  enrich: true,
});

const xy = res.find((x) => res.field === "timestamp")?.result;
console.log(xy);

Everything works fine up to this point and I get the results I wanted, but the doc object is undefined...

image

However, when I try to do the same, but only create the indexes like in the first code example, then everything works as expected:

image

Does this mean #254 is not fixed yet or am I doing something wrong? Do I need to handle the data object while importing in a special way instead of just importing the whole data?

zanzlender avatar Apr 03 '23 21:04 zanzlender

I've also noticed that for some reason one of the saved files is timestamp.store.json I don't know how it's decided what the name is but it seems kind of unintuitive since most of my data is actually in the transcript property, but is then not saved in a transcript.json or transcript.store.json.

image

But I don't know if this plays any role in my problem.

zanzlender avatar Apr 04 '23 18:04 zanzlender

I'm also experiencing this.

JSFiddle Example: https://jsfiddle.net/tnx5qLzd/

grimsteel avatar May 18 '23 01:05 grimsteel

Chiming in with the same issue. My setup looks something like:

// Exporting
import flexsearch from 'flexsearch'

const docIndex = new flexsearch.Document({
  document: {
    id: 'id',
    index: ['title', 'description', 'source', 'tags', 'body'],
    store: ['title', 'description', 'tags'],
  },
});

  documents.forEach((doc) => {
    docIndex.add(doc.slug, {
      title: doc.title,
      description: doc.description,
      source: doc.source,
      tags: doc.tags,
      body: doc.body,
    })
  })

  docIndex.export((key, data) => {
    // Line-delimited JSON-objects, plays nicely with the async-ish nature of export
    stdout.write(JSON.stringify({key, data}) + '\n')
  })
})
// Importing
import { Document } from 'flexsearch'

const docIndex = new Document({
  document: {
    id: 'id',
    index: ['title', 'description', 'source', 'tags', 'body'],
    store: ['title', 'description', 'tags'],
  },
}) as Document<Post, string[]>;

await readByLines('/flexsearch.json', (line: string) => {
  const imp = JSON.parse(line)
  docIndex.import(imp.key, imp.data);
})

const searchResults = docIndex.search({
  query: 'the query',
  enrich: true,
});

// searchResults[].result[].doc is undefined

bcspragu avatar Aug 11 '23 16:08 bcspragu

I’m running into the same bug. This remains a problem with version 0.7.34

maxhoffmann avatar Jan 06 '24 18:01 maxhoffmann

I'm running into the same bug. like: const keys = fs .readdirSync(searchIndexPath, { withFileTypes: true }) .filter(item => !item.isDirectory()) .map(item => item.name)

for(let i = 0, key; i < keys.length; i++){

key = keys[i];
// console.log(key.slice(0, -5));
const data = fs.readFileSync(`${searchIndexPath}${key}`, 'utf8')
console.log(key.slice(0, -5) , data);
index.import(key.slice(0, -5) , data);

}

but i find it fix when running like this: const keys = fs .readdirSync(searchIndexPath, { withFileTypes: true }) .filter(item => !item.isDirectory()) .map(item => item.name)

for(let i = 0, key; i < keys.length; i++){

key = keys[i];
// console.log(key.slice(0, -5));
const data = fs.readFileSync(`${searchIndexPath}${key}`, 'utf8')
const parsedData = JSON.parse(data);
console.log(key.slice(0, -5) , parsedData );
index.import(key.slice(0, -5) , parsedData );

}

adding JSON.parse(data); is OK ,have a try

kgwosh avatar May 12 '24 04:05 kgwosh