psibase
psibase copied to clipboard
WASM DB API
psibase exposes a Key/Value API to WASM applications that needs to be minimal, yet flexible enough to maximize performance.
For example, point lookups can be much faster than lower bound queries, but if all point lookups are performed by searching for a lower bound and checking to see if the key is correct then performance will suffer.
When building WASMs most apps follow a read, modify, store pattern with known keys. It is the exceptional case that a search for the lower bound is desired or that a new key is being entered for the first time. If the database knows the intent it can better optimize its query strategy.
The Arbtrie database structure stores groups of leaf key/value pairs containing up to 400 keys in a single page. For data integrity purposes a 1 byte hash of each key is stored as a contiguous array parallel to the sorted index. If a search for a specific key is requested it calculates the hash and does a linear scan looking for the hash, if found it checks the key, and then continues.
Even with 400 keys, there is only an average of 1.5 collisions resulting in a compare. With lower density situations there may not be any collisions. This access pattern is far more friendly to the CPU cache and branch predictor than a binary search and produces meaningful performance gains when searching for an exact match of a key.
Proposed API for Discussion
The following API is notional, aimed at conveying the operations and information, and would have to be adapted to WASM friendly methods.
get( string key, char* buffer, int buffer_len, int offset ) -> int; // return bytes read or -1 if not found
put( key, char* buffer, int buffer_len, int offset ) -> int; // return the new size of the record
lowerbound( string key, char* key_out, int key_out_space ) -> size of the key or -1 if end
upperbound( string key, char* key_out, int key_out_space ) -> size of the key or -1 if end
last( string prefix, char* key_out, int key_out_space )-> size of key out or -1 if no keys with prefix
remove( string from, string to ) -> does return anything, assume success
count( from, to ) -> database has efficient range counting built in
next( string key, char* key_out, key_out_space ) -> length of key out, size of object; //does point lookup + iterator to next
prev( string key, char* key_out, key_out_space )
// assuming there is nothing at new_prefix, moves/renames all keys,
move( string cur_prefix, string new_prefix, bool recurse ) - rename can be done in O(1)
There are some efficiency gains to be had if the following were separate:
- upsert - creates key if it doesn't exist, otherwise updates. (lower bound)
- update - requires key exist (point lookup)
- insert - knows the key doesn't exist so skips the point lookup
Requiring upsert to return the old size provides little useful info and forces a full query when, in theory, writes could be buffered until a read forces them through. Inserting writes sequentially is faster if you can buffer the random writes. Buffering writes becomes impossible if the API imposes a certain return value to upsert.