stateline Write chain outputs more efficiently

Use run length encoding to reduce space. Maybe switch to binary format, or even use something compression library such as snappy (https://github.com/google/snappy).

May 21 '15 05:05 darrnshn

including -- binary (hdf5?) run length encoding

Mar 06 '16 22:03 lmccalman

multiple files for long chains?

Mar 06 '16 22:03 lmccalman

I think multiple files is a good idea. I'm still not sure about binary vs text, since the more complicated the file format, the harder it is to read in other languages (e.g. if I'm using an R binding I would expect to easily read the file in R). If the format is too complicated then each language binding would have to provide its own chain file reading functions.

Mar 09 '16 21:03 darrnshn

Embedded vs server: http://stackoverflow.com/questions/3108437/when-to-use-an-embedded-database

It's basically a toss up between a binary format server and an embedded binary DB. The only difference really is that the server will run in a separate process and the DB in a separate thread.

Embedded DBs:

Raw ostream: Hard to implement atomicity etc. by hand.
CSV: text protocol is too slow
LevelDB: We used this before, but it's not a standard format so Python can't read it.
HDF5: A bit overkill?
https://en.wikipedia.org/wiki/Embedded_database#Comparisons_of_database_storage_engines

Server DB:

InfluxDB: I couldn't find the binary protocol. https://github.com/influxdata/influxdb/issues/139 ("the text protocol with gzip already saturates the storage engine")
Graphite: text protocol
Memcached: in memory and key value store
Redis: in memory and key value store
Interestingly...Postgres: https://news.ycombinator.com/item?id=8368509

Requirements:

Fast: a binary protocol would be faster than text
Atomic: so we can Ctrl-C and not break it. (If what we're doing is not atomic, we could always trap Ctrl-C signals (as we are now) and only stop when we're not in the middle of writing to a file.)
Standard format / Easy to parse format: so something like Python can read it without a C++ executable to extract the data.

Aug 04 '16 06:08 darrnshn

stateline stateline copied to clipboard

Write chain outputs more efficiently

stateline
stateline copied to clipboard