piker Choosing a time series database

Did some decent starter research on what systems will work for production grade data feed collaboration.

I'm going to list some top picks and what I deem to be pros/cons; comes with a check-mark if we have an existing integration.

[x] alpaca's marketstore
- built for use with numpy
- it seems simple, with a reasonably manageable code base in golang; hackable and with an interesting design
- recently got grpc support
- plugin system which seems to a a few decent things you might not find in your average TSDB:
  - a websocket streaming interace on updates
  - an on disk aggregator
- no clustering, failover, sharding

on the suit fronts

[ ] #495
- general roadmap dash: https://roadmap.databento.com/roadmap
- docs: https://docs.databento.com/
  - symbology API (#467!): https://docs.databento.com/api-reference-historical/symbology
    - for futes: https://roadmap.databento.com/b/n0o5prm6/feature-ideas/smart-symbology-lead-month-contracts
    - WIP for opts: https://roadmap.databento.com/b/n0o5prm6/feature-ideas/smart-symbology-for-options
    - treasuries data: https://roadmap.databento.com/b/n0o5prm6/feature-ideas/cash-treasuries-data
  - [ ] live API clients and avail src codez: https://github.com/databento/
    - python client: https://github.com/databento/databento-python
      - docs: https://docs.databento.com/getting-started?historical=python&live=python
      - api docs: https://docs.databento.com/api-reference-live
    - rust backend lib: https://github.com/databento/dbn
      - already opened an issue asking about running you're own: https://github.com/databento/databento-python/issues/9
[ ] the new arcticdb, a replacement for the prior arctic project (see below)
- repo: https://github.com/man-group/ArcticDB
- Feb 2023 update about the release of the C++ rewrite in collab with bloomy: https://www.man.com/man-group-brings-powerful-dataframe-database-product-arcticdb-to-market-with-bloomberg
- [ ] need to figure out if we can store np.ndarrays as well or potentially write a polars adapter?
[ ] the original arctic by the "man group" which allows serializing dfs and np arrays.
- seems to have competed more directly with marketstore but is built on mongodb
- has a tick storage format using pd.DataFrame: https://arctic.readthedocs.io/en/latest/tickstore/
  - talk on storing 10^12 rows of timeseries: https://vimeo.com/showcase/3660528/video/145842301
- cryptofeed backend impl: https://github.com/bmoscon/cryptofeed/blob/6cf185de641cd63b180f2a34cf52c0773e5961a3/cryptofeed/backends/arctic.py
[ ] QuestDb which seems to claim QuestDB is the fastest open source time series database
- but it's java...
- repo: https://github.com/questdb/questdb
- cryptofeed backend impl: https://github.com/bmoscon/cryptofeed/blob/6cf185de641cd63b180f2a34cf52c0773e5961a3/cryptofeed/backends/quest.py
[ ] greptimedb: https://github.com/GreptimeTeam/greptimedb

An open-source, cloud-native, distributed time-series database with PromQL/SQL/Python supported.
- written in rust

legacy-ish FOSS projects that probably aren't specialized enough..

influxdb is the gold standard in TSDBs
- here's a quant related SO question that resulted in this pump
- the github repo
- cryptofeed backend impl: https://github.com/bmoscon/cryptofeed/blob/6cf185de641cd63b180f2a34cf52c0773e5961a3/cryptofeed/backends/influxdb.py
timescaledb is an extension on top of postgres
- here's a 18' pump on pushing 3B points a day and how it outperforms influxdb
- another pump blog post on ingesting from kafka
- has all the benefits of postgres
- afaik can work with apache arrow and turbodbc

Additional Resources

blog post on connecting apache arrow to the SQL world
a post from uber on moving from postgres to mysql
cryptofeed's long list of backends which includes some serious projects i hadn't seen before.

Thots

One of the things I want to keep in mind is supporting apache arrow formatted data for streaming. The arrow project's new flight system seems to be the way things are going.

May 16 '20 16:05 goodboy

Probably worth reading up on some professional grade systems:

kdb+
- which is a column oriented db
- there's a free version which requires being on-line

May 18 '20 22:05 goodboy

Did a little more research on timescaledb and found this reddit thread which criticized its performance :crying_cat_face:

On the brighter side, Ameobea suggested the very cool looking tectonicdb for L2 tick data:

example python client
the socket spec
L2 book plotting code discussed in detail in this blog post.
oh, and it's got an offline parser which can be used to read dataframes, yeye!

May 25 '20 21:05 goodboy

Noticing a category of "IoT" capture dbs:

griddb our of japan

Jun 10 '20 12:06 goodboy

https://github.com/singularity-data/risingwave

May 26 '22 17:05 goodboy

piker piker copied to clipboard

Choosing a time series database

on the suit fronts

legacy-ish FOSS projects that probably aren't specialized enough..

Additional Resources

Thots

piker
piker copied to clipboard