Athena icon indicating copy to clipboard operation
Athena copied to clipboard

Support deleting and modifying values

Open sanity opened this issue 15 years ago • 4 comments

Currently tag->value pairs can only be added to the Store, they cannot be deleted, or modified.

Deleting should be easy enough, we just set this position in the Store.values ArrayList to null, and ensure that the StoreIterator knows to skip null values. The challenge is that over time the ArrayList will fill up with gaps - so we may need some kind of periodic "compaction" process to get rid of them.

The difficulty with modifying values is that, let's say we modify the tags on a value somewhere in the middle of the store. Now, we may have several shortcuts skipping over this value, even though after modification the value's tags might match one of those queries. The result will be that Store.find() fails to return this value when it should.

One option is, rather than modifying a value's tags, we just delete it, and then add it again using the new tags. It will be added to the end of Store.values.

Of course this assumes that we have a way to delete values.

Thoughts?

sanity avatar Jun 23 '10 17:06 sanity

"Deleting should be easy enough, we just set this position in the Store.values ArrayList to null, and ensure that the StoreIterator knows to skip null values"

This could get tricky, especially when we've got a large data set. Imagine having all of these null values, which aren't needed. Maybe a one-pass (lazy) compactor should be built? However we can't just run the compactor on every insert, as inserts would become log(n)

Also, how would the query "predictor" fall into this?

KushalP avatar Jun 23 '10 18:06 KushalP

I think perhaps the compactor would run periodically, much like Java's garbage collector.

Not sure what you mean by "query predictor", can you elaborate?

sanity avatar Jun 23 '10 18:06 sanity

It's what I read into from the README, predictor probably isn't the right word, more "optimiser":

"Basically as Athena searches, it keeps track of objects it could have skipped, and creates shortcuts to avoid checking those objects in the future."

KushalP avatar Jun 23 '10 18:06 KushalP

I haven't had the time to study Athena in great detail, but if I understand the inner workings correctly it creates a "skipping list" for every Query (and if applicable sub-Queries). In order to traverse the matching Values for a given Query the skipping list is visited (through findShortCut), returning a QueryIntPair which, in turn, point to a (fixed) position in the Store's values.

Now I noticed the comment at Store.java:173

// TODO: Would be nice to find a way to get rid of this

Which would remove the position information for a given Value, possibly favoring the Store for having the responsibility to contain that information. This made me wonder why it isn't the other way around. Why aren't the QueryIntPairs actually QueryValuePairs, pointing to the next Value, instead of pointing to a position in an ArrayList? As a consequence the choice for O(1) random access can be dropped and thus instead of an ArrayList a linked list (of sorts) can be used. This would also imply that a Value's position doesn't denote its actual position in the list but is merely a way of determining its relative position to another Value in the Store's list.

By dropping the ArrayList, issuing a delete wouldn't require compaction for memory purposes or improving performance later on. Instead, a compaction of assigned positions might be desirable when, upon insertion, the position's maximum value is reached. Fortunately, this might take a while.

Ofcourse, the downside of this is that actually (initially) deleting a Value became trickier than just dereferencing it from the ArrayList. However, I imagine it is possible to maintain a tree like structure between the skipping lists of Queries and their corresponding sub Queries (like having a tree in a skip list) if direct deletion is preferred. Another downside of this proposal would be the increased memory overhead due to this pointer/reference soup.

Also, I see my comment turned out to be a little longer as I imagined it would be, thanks for reading so far ;) and my apologies if my suggestion made little sense as result of a misunderstanding of the internal working of Athena. Hopefully it'll prove to be useful somehow.

taitale avatar Jul 01 '10 11:07 taitale