concurrent-trees icon indicating copy to clipboard operation
concurrent-trees copied to clipboard

Non-linearizable behavior of ConcurrentSuffixTree

Open alefedor opened this issue 4 years ago • 2 comments

There is a bug that an element, that is added in a concurrent thread, can be found by getValueForExactKey and not found by the following getValuesForKeysEndingWith.

The bug can be reproduced with Lincheck. The scenario that can produce incorrect results:

Parallel part:
| put(aa, 4): null | getValueForExactKey(aa):       4  |
|                  | getValuesForKeysEndingWith(a): [] |

alefedor avatar Aug 13 '20 09:08 alefedor

Could you provide more information to confirm that that is a bug?...

The concurrency model of the suffix tree is that writes are guaranteed to be visible to reading threads, after the put() method has returned. However, if a reading thread accesses the tree before the put() method has returned, then the behavior is undefined as to whether or not it will see the new data.

The suffix tree actually maintains a couple of data structures under the hood: a valueMap, and a radix tree containing suffixes for each of the entries. Also, a single entry which is put() into the tree, actually can result in multiple suffixes being added for that entry.

Consider the following timeline for a writing thread which is calling the put() method:

  • t0 put() method is called
  • t1 valueMap is updated
  • t2 start adding suffixes to the radix tree
  • t3 finish adding suffixes to the radix tree
  • t4 put() method returns

In terms of the API contract, this would only be a bug if it occurs at t < 4. However the current implementation of that contract is as follows. (The implementation is subject to change though, so people should not rely on any of the following timing details.)

Internally, the getValueForExactKey() and getValuesForKeysEndingWith() methods have different code paths.

getValueForExactKey() has a faster code path. It does not actually read from the radix tree, because it can and does just read directly from the valueMap. This means getValueForExactKey() will return the new data when t > 1.

getValuesForKeysEndingWith() OTOH, reads from the radix tree, and then from the valueMap. This method will definitely not return the data if t < 2. If 2 < t < 3, then it might or might not return the data, depending on whether the specific suffixes which have been added so far, match the given argument (i.e. this is data-dependent).

So in a nutshell, the behavior you observed would be expected always when 1 < t < 2, and it is permitted to occur in terms of API when t < 4. But it should never occur when t >= 4. Let me know if that is the case.

npgall avatar Aug 17 '20 21:08 npgall

@npgall I can also provide the single-threaded execution that leads to these results.

| put(aa, 4)                                                                               |                                   |
|   acquireWriteLock() at ConcurrentSuffixTree.put(ConcurrentSuffixTree.java:87)           |                                   |
|   put(aa,4): null at ConcurrentSuffixTree.put(ConcurrentSuffixTree.java:94)              |                                   |
|   getValueForExactKey(a): null at addSuffixesToRadixTree(ConcurrentSuffixTree.java:163)  |                                   |
|   put(a,[aa]): null at addSuffixesToRadixTree(ConcurrentSuffixTree.java:166)             |                                   |
|   add(aa): true at addSuffixesToRadixTree(ConcurrentSuffixTree.java:168)                 |                                   |
|   getValueForExactKey(a): null at addSuffixesToRadixTree(ConcurrentSuffixTree.java:163)  |                                   |
|   acquireWriteLock() at putInternal(ConcurrentRadixTree.java:453)                        |                                   |
|   searchTree(a): SearchResult at putInternal(ConcurrentRadixTree.java:456)               |                                   |
|   switch                                                                                 |                                   |
|                                                                                          | getValueForExactKey(aa): 4        |
|                                                                                          | getValuesForKeysEndingWith(a): [] |
|                                                                                          |   thread is finished              |
|   READ: null at ConcurrentRadixTree.putInternal(ConcurrentRadixTree.java:459)            |                                   |
|   createNode(a,[aa],[Node@1],false): Node@2 at putInternal(ConcurrentRadixTree.java:487) |                                   |
|   updateOutgoingEdge(Node@2) at putInternal(ConcurrentRadixTree.java:490)                |                                   |
|   releaseWriteLock() at putInternal(ConcurrentRadixTree.java:560)                        |                                   |
|   add(aa): true at addSuffixesToRadixTree(ConcurrentSuffixTree.java:168)                 |                                   |
|   releaseWriteLock() at ConcurrentSuffixTree.put(ConcurrentSuffixTree.java:103)          |                                   |
|   result: null                                                                           |                                   |
|   thread is finished                                                                     |                                   |

The thread is switched between the two additions of different suffixes in addSuffixesToRadixTree, and I am not sure whether it is the case 1 < t < 2.

However, even if it is, this behavior contradicts the description of the data structure. The description says that all puts and reads are made atomically, but the scenario results show that they are not atomic.

alefedor avatar Aug 18 '20 04:08 alefedor