cqengine icon indicating copy to clipboard operation
cqengine copied to clipboard

IndexedCollection is not Serializable

Open GoogleCodeExporter opened this issue 10 years ago • 8 comments

What steps will reproduce the problem?
1. Try to serialize IndexedCollection

What is the expected output? What do you see instead?
Serializable IndexedCollection

What version of the product are you using? On what operating system?
1.0.3 on Mac OS X

Please provide any additional information below.
Support for serialization would be great. In this case user would be able to 
setup some indexes, serialize the entity and later retrieve it already with 
indexes.

Original issue reported on code.google.com by [email protected] on 5 Feb 2013 at 2:47

GoogleCodeExporter avatar Aug 23 '15 19:08 GoogleCodeExporter

Thanks mitja for the request. I can see how this could be useful.

It's probably the case that most of the in-memory indexes, could be rebuilt in 
memory faster than they could be deserialized from disk. So only the objects in 
the collection would need to be serialized/deserialized, and then the indexes 
re-added. It sounds like a case for adding a readObject() method.

In the meantime, say with version 1.0.3, you could use the 
IndexedCollectionSerializer class below, to serialize an indexed collection. 
The main catch, is when deserialized, you *need to re-add the indexes!*. See 
the SerializerDemo class below for an example.

I'll think about better serialization support for the next release. Thanks!

--------------------------------------------------------------------------------
package com.googlecode.cqengine;
import com.googlecode.cqengine.index.radix.RadixTreeIndex;
import java.io.File;
public class SerializerDemo {

    public static void main(String[] args) {
        // *************** Build some collection... ***************
        IndexedCollection<Foo> myCollection = CQEngine.newInstance();
        addIndexesToMyCollection(myCollection);

        // Add some objects...
        myCollection.add(new Foo("bar"));
        myCollection.add(new Foo("baz"));

        // *************** Serialize the collection... ***************
        IndexedCollectionSerializer.serialize(myCollection, new File("foo.dat"));

        // *************** Deserialize the collection... ***************
        IndexedCollection<Foo> myDeserializedCollection = IndexedCollectionSerializer.deserialize(new File("foo.dat"));
        // Need to add indexes again to the deserialized collection!!...
        addIndexesToMyCollection(myDeserializedCollection);

        // ************ myDeserializedCollection should now have the same state as myCollection *******
    }

    static void addIndexesToMyCollection(IndexedCollection<Foo> indexedCollection) {
        indexedCollection.addIndex(RadixTreeIndex.onAttribute(Foo.NAME));
    }
}
--------------------------------------------------------------------------------
package com.googlecode.cqengine;
import com.googlecode.cqengine.attribute.Attribute;
import com.googlecode.cqengine.attribute.ReflectiveAttribute;
import java.io.Serializable;
public class Foo implements Serializable {
    public final String name;

    Foo(String name) {
        this.name = name;
    }

    public static final Attribute<Foo, String> NAME = ReflectiveAttribute.forField(Foo.class, String.class, "name");
}
--------------------------------------------------------------------------------
package com.googlecode.cqengine;
import java.io.*;
import java.util.ArrayList;
import java.util.List;
public class IndexedCollectionSerializer {

    public static <O> void serialize(IndexedCollection<O> indexedCollection, File destination) {
        OutputStream os = null;
        try {
            os = new BufferedOutputStream(new FileOutputStream(destination));
            List<O> objectsList = new ArrayList<O>(indexedCollection);
            ObjectOutputStream oos = new ObjectOutputStream(os);
            oos.writeObject(objectsList);
            oos.flush();
        }
        catch (Exception e) {
            throw new IllegalStateException(e);
        }
        finally {
            if (os != null) {
                try { os.close(); } catch (Exception ignore) {}
            }
        }
    }

    public static <O> IndexedCollection<O> deserialize(File source) {
        ObjectInputStream ois = null;
        try {
            ois = new ObjectInputStream(new BufferedInputStream(new FileInputStream(source)));
            @SuppressWarnings({"unchecked", "UnnecessaryLocalVariable"})
            List<O> objectsList = (List<O>) ois.readObject();
            return CQEngine.copyFrom(objectsList);
        }
        catch (Exception e) {
            throw new IllegalStateException(e);
        }
        finally {
            if (ois != null) {
                try { ois.close(); } catch (Exception ignore) {}
            }
        }
    }
}
--------------------------------------------------------------------------------

Original comment by [email protected] on 6 Feb 2013 at 6:55

  • Changed state: Accepted
  • Added labels: Type-Enhancement
  • Removed labels: Type-Defect

GoogleCodeExporter avatar Aug 23 '15 19:08 GoogleCodeExporter

Niall,

first of all thank you for very fast response. Unfortunately this is not my use 
case. I would use to store IndexedCollection to some Memcache engine(needs to 
implement Serializable) and not to file. So due the fact that you stated that 
serializing indexes would take to much time, it would be helpful to avoid  
"CQEngine.copyFrom" step, so that IndexedCollection could be "stored" directly 
without copying and again retrieved without copying, even if indexes must be 
added again.

Otherwise I need to say: Great project!!

Original comment by [email protected] on 6 Feb 2013 at 7:18

GoogleCodeExporter avatar Aug 23 '15 19:08 GoogleCodeExporter

Usually memcache is used to serialize a single object per key. Whereas in this 
case you will store an entire collection against a single key? Will this be 
retrieved on application startup, or are you planning to do this for each 
request?

Memcache is basically "remote RAM". Which indeed is usually faster than local 
disk, but slower than local RAM. It might be worth looking at a distributed 
cache which supports local RAM with distributed eviction, instead of going 
across the network every time. Also take a look at Kryo as an alternative to 
Java serialization. It is much faster and does not require classes to implement 
the Serializable interface. I've not tested it with IndexedCollection, but I've 
had good results from it in the past.

Nonetheless, even if Kryo works with IndexedCollection right now, there are 
still a few optimizations to CQEngine which could improve serialization. I will 
add support to serialize the indexed collection without copyFrom in the next 
release.

Original comment by [email protected] on 8 Feb 2013 at 12:54

GoogleCodeExporter avatar Aug 23 '15 19:08 GoogleCodeExporter

Not exactly what was requested, but IndexedCollection can now be persisted to 
off-heap memory or to a file on disk. It does not rely on Java serialization.

Original comment by [email protected] on 20 Apr 2015 at 8:25

  • Changed state: Fixed

GoogleCodeExporter avatar Aug 23 '15 19:08 GoogleCodeExporter

Reopening this issue, as the current situation is not a complete fix, and could probably be improved.

npgall avatar Jan 04 '16 22:01 npgall

I don't understand the status of the issue but I would like to contribute with my use case.

I just need to snapshot a collection along with its already constructed indexes to disk in order to simply survive an application restart.

My collection is fed and indexed with 200.000 records from SQL that mostly never change. Not a huge figure but SQL source could be slow in some scenarios. So I simply would like to:

  • Load all SQL records for the first time
  • Snapshot them to an opaque binary file
  • Restart/kill the application
  • Have the application check if the file is up to date with database (SELECT MAX (last_update)...)
  • Load the collection and the indexes from disk to memory

I don't understand if disk store satisfies my scenario. Peeking around the code, it looks like that disk store internally uses SQLite from a disk source, which I don't think loads the data store to memory. What I want to achieve for superfast queries is to snapshot the indices along with the data if possible. Once indices are built, there is no need to recompute them when underlying data has not changed.

Many thanks

djechelon avatar Apr 12 '19 16:04 djechelon

FWIW - I use the following and it survives restarts without having to do any additional management

ConcurrentIndexedCollection<Job>(DiskPersistence.onPrimaryKeyInFile(Job.ID, new File(dbdir.toString(), "jobs.dat")));

jayaramcs avatar Apr 12 '19 16:04 jayaramcs

If you only use disk persistence, and you only add disk indexes, then the collection and indexes will persist between restarts.

You will need to programmatically "add" the disk indexes to the collection again at startup, but they will detect that they were persisted previously so they won't be rebuilt.

However if you add non-disk indexes to the collection, then the state of those indexes will be completely lost after a restart. So when you add those indexes to the collection again after a restart, they will be completely rebuilt.

Hope that helps, Niall

On Fri, 12 Apr 2019, 17:46 Jayaram Sreevalsan, [email protected] wrote:

FWIW - I use the following and it survives restarts without having to do any additional management

ConcurrentIndexedCollection<Job>(DiskPersistence.onPrimaryKeyInFile(Job.ID, new File(dbdir.toString(), "jobs.dat")));

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/npgall/cqengine/issues/12#issuecomment-482643954, or mute the thread https://github.com/notifications/unsubscribe-auth/ACuJisjBKHBfe22V0J33fsIMo5JSBAALks5vgLhQgaJpZM4GVYrp .

npgall avatar Apr 12 '19 18:04 npgall