sophia
sophia copied to clipboard
invalidated memory objects leads to crash
I obtain an object to a database entry:
if ( (obj= sp_object(db)) != 0 && sp_set(obj,"key",key,strlen(key)) == 0 ) result = sp_get(db,obj,&len);
I then store these result pointers in an array, so I can do a direct index -> object lookup table this works fine for a while, then it seems that the objects are invalidated (due to writing out to disk?)
So it seems that I cannot store a permanent pointer and to catch the invalidation event and to reload all the pointers seems inefficient. I am hoping there is some setting to preserve pointers that have not been destroyed (with reference counter?) so maybe this is a bug, or by design.
Alternatively, if I can just have a way to find a new records permanent ID and be able to directly load that ID, that would solve my use case here.
Thanks!
James
P.S. I tried adding some other fields than "key" and "value", but it did not seem to be reliable, are there any limitations on what fields can be added?
Yes, you should copy returned key-value and destroy the object.
if ( (obj= sp_object(db)) != 0 && sp_set(obj,"key",key,strlen(key)) == 0 )
result = sp_get(db,obj); // only two arguments, no size
if (result) {
int keysize = 0;
char *key = sp_get(result, "key", &keysize);
int valuesize = 0;
// get the pointer of value inside result object, no copy
char *value = sp_get(result, "value", &valuesize);
// copy key-value
sp_destroy(result);
}
Object fields are dependent on storage format and destination operation.
For v1.2.2 there are only one format which involve key and value fields. On master branch (v1.2.3) , there are multi-part keys supported and key can be additionally other key name. This is not documented yet. Other fields are metadata, etc. This likely to be extended in future.
so it is safe to use the "value" pointer for permanent (process duration) use?
I think sophia is almost supporting a full JSON capability (like cJSON) and would be very handy to be able to use it as a JSON lib that just happens to be databased, but this is not critical.
I am having to convert binary keys to hex and losing 50% of space as I cant find out how to do binary keys.
I am making dual entries (key -> value) and (value -> key), but it seems inefficient and while "value" supports binary, "key" appears to need to be a string. then again maybe you are fully optimizing such cases internally?
Again, I must say that I am very impressed! Most things just work and there is no speed issues.
James
dereferencing the "key" field is running stable. I also noticed that key can be binary, so things working well. Just the cost of one point to get reverse lookup, well worth the space/time tradeoff.
I am able to use it as a in memory key/value store with the reverse lookup. not sure I will have time to get replication working as I am under deadline pressure in the short term. If you do make features for this that make it just a bit easier to use, then I would be grateful. I think the network transport can be abstracted and with a conflict resolution function pointer, that would allow for customized replication
James
Thanks) String can be anything, there are no constrains about it.
Do you need to index values as well? Can you describe your use case?
I need to be able to quickly find key, given value and also value given key. What I am doing is storing pointers according to the order they were added to the database. Basically the keys are rather large (~128 bytes), but the total number of keys will easily fit into 32bits, so I map the long string to 32bits and operate on the 32bit index for most things. Only if I need to display the full value, will I need to get the actual key.
So if there was a way to get a specific "index", that would be much more efficient than these external tables. Just being able to retrieve the Nth object in a database (order of insertion), that will be enough. If I could also change the specific index, probably easier to just support a user defined index in addition to a permanent creation order index
Sounds like you need support of secondary indexes, but with value copy to reduce read times. You can try to set(key, value) and set(value, key) or use second database which will build its own index.
If you can achieve database to be fully compacted and your database is bigger then RAM there will be no much difference what kind of scheme you are using, because read complexity is all about disk seeks (which will be O(1) in that and a perfect case).
is there a way to find the primary index of a database entry? the number of entries will fit into memory so avoiding sp_ API calls is the primary performance issue in this case
With regards to https://github.com/pmwkaa/sophia/issues/83#issuecomment-97389567 , do I understand it correctly, that Sophia reads the whole persisted data (i.e. copies them to RAM) disregarding their size, then returns a pointer to them, but due to internal scheduling requires the user to make another copy of the whole (potentially big data) to another place in RAM to allow using them?
This wouldn't make me much happy as double copy is never a good thing and second reading bigger data (e.g. 1GByte of contiguous data in one value) completely to RAM is another "bad practice" :wink:.