ndx icon indicating copy to clipboard operation
ndx copied to clipboard

Feature request - Ability to serialize and load the index

Open ajayambre opened this issue 7 years ago • 10 comments
trafficstars

Use case - I have a mobile app with multiple Pouchdb databases and want to implement the content search on that data.

Problem: I want to avoid querying all the databases for their documents and then add those to the document index everytime the app is launched.

I would like to have an API to serialize my index so that I can then persist it to the file system when the app is being closed. When the app is started again next time, I will just load the index instead of querying the databases and creating the index again.

ajayambre avatar Jul 20 '18 07:07 ajayambre

Going to PR this feature soon.

vladimiry avatar Jan 07 '19 01:01 vladimiry

@localvoid I'm considering converting InvertedIndexNode class to an interface since it doesn't have any logic but used as a data container. Are there any reasons for not doing that? Serialization actually works with classes, but right now I don't care about converting plain objects to class instances doing deserialization, so it's working but still not the precise deserialization.

vladimiry avatar Jan 08 '19 12:01 vladimiry

I'm going to use MessagePack serialization format, the serialized data is Uint8Array, arguments:

  • On this data file MessagePack works 4-5x times faster than https://github.com/BridgeAR/safe-stable-stringify
  • Standard JSON.stringify just fails with Maximum call stack size exceeded error (it was expected).
  • Streamed stringifying works, with https://github.com/Faleij/json-stream-stringify for example, but it's slow.
  • Serialized MessagePack data file is 20-30% smaller than JSON.

vladimiry avatar Jan 08 '19 12:01 vladimiry

@vladimiry The only reason why I've used classes is to be able to identify this objects in the memory profiler. I don't like how I've implemented everything in this package and I think that it would be better with plain objects and simple functions, so that DCE could eliminate unused code.

localvoid avatar Jan 08 '19 12:01 localvoid

Ok, will go with interface then.

I think that it would be better with plain objects and simple functions, so that DCE could eliminate unused code.

One step at a time :)

vladimiry avatar Jan 08 '19 12:01 vladimiry

@localvoid also do you think replacing null with undefined would make sense (optional fields)? In InvertedIndexNode:

next?: InvertedIndexNode<I>;
firstChild?: InvertedIndexNode<I>;
firstPosting?: DocumentPointer<I>;

vladimiry avatar Jan 08 '19 13:01 vladimiry

@vladimiry optimizing compilers like Google Closure Compiler are removing properties with undefined values when object is instantiated, so it will make all callsites that accessing this objects polymorphic. That is why I prefer to use null values.

localvoid avatar Jan 08 '19 13:01 localvoid

Just checked latest GCC, and I can't reproduce this behavior on simple examples, so maybe it is ok. I had this problem 4 years ago.

localvoid avatar Jan 08 '19 13:01 localvoid

The thing is if I go with MessagePack then it will serialize undefined values as nulls, by design, which would be a sort of inconsistency. This implementation https://github.com/kawanet/msgpack-lite. So I think I better stick to null for now.

vladimiry avatar Jan 08 '19 13:01 vladimiry

Here is the PR https://github.com/localvoid/ndx/pull/5.

vladimiry avatar Jan 08 '19 16:01 vladimiry