sophia icon indicating copy to clipboard operation
sophia copied to clipboard

concurrent write use case

Open fwang2 opened this issue 9 years ago • 3 comments

hi -

I have a use case where multiple processes that needs to concurrently insert (k,v) where key is string say, a full path string); and later there will be prefix-based search to return all entries with the same prefix. It needs to scale to billion(s) entries. My questions are two fold: (1) does sophia support current writes? any example I can refer to? (2) if so, is it better to insert as sorted (for better query performance) or append only for better write performance?

Thanks

fwang2 avatar Oct 09 '15 13:10 fwang2

Hi, yes sophia support search by prefix and is capable to store billions of records.

  1. currently sophia does not support access from multiple processes, but do so for threads.
  2. insert order should not affect query performance, both sequential and random insert-order would be fine.

pmwkaa avatar Oct 09 '15 17:10 pmwkaa

hmm ... the use case has MPI in mind (cluster settting), so multiple processes are a given. In that case, it seems the only way is to have each process operate on its own database. However, that leads to the question of post processing: is there any way to merge database efficiently? if not, can the query easily done across multiple databases?

On Fri, Oct 9, 2015 at 1:00 PM, Dmitry Simonenko [email protected] wrote:

Hi, yes sophia support search by prefix and is capable to store billions of records.

  1. currently sophia does not support access from multiple processes, but do so for threads.
  2. insert order should not affect query performance, both sequential and random insert-order would be fine.

— Reply to this email directly or view it on GitHub https://github.com/pmwkaa/sophia/issues/93#issuecomment-146931921.

fwang2 avatar Oct 10 '15 01:10 fwang2

This depends on your case.

Sophia can support different databases using single environment. Each database will have it's own directory. Several databases can be involved in a transactions. But this works only within a single process.

Here are some quick ideas:

a. create sophia environment for each process. After work is complete: open each database and create a new one. Eg: open cursor for each database, find min = compare(each cursor key), set(min). This scheme should avoid concurrency, but this seems to complex.

b. create simple network server build on top of sophia which can storage process requests (network or other messages) from an MPI processes.

pmwkaa avatar Oct 14 '15 16:10 pmwkaa