piker
piker copied to clipboard
Choosing a time series database
Did some decent starter research on what systems will work for production grade data feed collaboration.
I'm going to list some top picks and what I deem to be pros/cons; comes with a check-mark if we have an existing integration.
- [x] alpaca's marketstore
- built for use with
numpy - it seems simple, with a reasonably manageable code base in golang; hackable and with an interesting design
- recently got grpc support
- plugin system which seems to a a few decent things you might not find in your average TSDB:
- a websocket streaming interace on updates
- an on disk aggregator
- no clustering, failover, sharding
- built for use with
on the suit fronts
-
[ ] #495
- general roadmap dash: https://roadmap.databento.com/roadmap
- docs: https://docs.databento.com/
- symbology API (#467!): https://docs.databento.com/api-reference-historical/symbology
- for futes: https://roadmap.databento.com/b/n0o5prm6/feature-ideas/smart-symbology-lead-month-contracts
- WIP for opts: https://roadmap.databento.com/b/n0o5prm6/feature-ideas/smart-symbology-for-options
- treasuries data: https://roadmap.databento.com/b/n0o5prm6/feature-ideas/cash-treasuries-data
- [ ] live API clients and avail src codez: https://github.com/databento/
- python client: https://github.com/databento/databento-python
- docs: https://docs.databento.com/getting-started?historical=python&live=python
- api docs: https://docs.databento.com/api-reference-live
- rust backend lib: https://github.com/databento/dbn
- already opened an issue asking about running you're own: https://github.com/databento/databento-python/issues/9
- python client: https://github.com/databento/databento-python
- symbology API (#467!): https://docs.databento.com/api-reference-historical/symbology
-
[ ] the new
arcticdb, a replacement for the priorarcticproject (see below)- repo: https://github.com/man-group/ArcticDB
- Feb 2023 update about the release of the C++ rewrite in collab with bloomy: https://www.man.com/man-group-brings-powerful-dataframe-database-product-arcticdb-to-market-with-bloomberg
- [ ] need to figure out if we can store
np.ndarrays as well or potentially write apolarsadapter?
-
[ ] the original
arcticby the "man group" which allows serializing dfs and np arrays.- seems to have competed more directly with
marketstorebut is built on mongodb - has a tick storage format using
pd.DataFrame: https://arctic.readthedocs.io/en/latest/tickstore/- talk on storing 10^12 rows of timeseries: https://vimeo.com/showcase/3660528/video/145842301
cryptofeedbackend impl: https://github.com/bmoscon/cryptofeed/blob/6cf185de641cd63b180f2a34cf52c0773e5961a3/cryptofeed/backends/arctic.py
- seems to have competed more directly with
-
[ ]
QuestDbwhich seems to claim QuestDB is the fastest open source time series database- but it's
java... - repo: https://github.com/questdb/questdb
cryptofeedbackend impl: https://github.com/bmoscon/cryptofeed/blob/6cf185de641cd63b180f2a34cf52c0773e5961a3/cryptofeed/backends/quest.py
- but it's
-
[ ]
greptimedb: https://github.com/GreptimeTeam/greptimedbAn open-source, cloud-native, distributed time-series database with PromQL/SQL/Python supported.
- written in
rust
- written in
legacy-ish FOSS projects that probably aren't specialized enough..
-
influxdbis the gold standard in TSDBs- here's a quant related SO question that resulted in this pump
- the github repo
cryptofeedbackend impl: https://github.com/bmoscon/cryptofeed/blob/6cf185de641cd63b180f2a34cf52c0773e5961a3/cryptofeed/backends/influxdb.py
-
timescaledbis an extension on top of postgres- here's a 18' pump on pushing 3B points a day and how it outperforms influxdb
- another pump blog post on ingesting from
kafka - has all the benefits of postgres
- afaik can work with apache arrow and turbodbc
Additional Resources
- blog post on connecting apache arrow to the SQL world
- a post from uber on moving from postgres to mysql
cryptofeed's long list of backends which includes some serious projects i hadn't seen before.
Thots
- One of the things I want to keep in mind is supporting apache arrow formatted data for streaming. The arrow project's new flight system seems to be the way things are going.
Probably worth reading up on some professional grade systems:
- kdb+
- which is a column oriented db
- there's a free version which requires being on-line
Did a little more research on timescaledb and found this reddit thread which criticized its performance :crying_cat_face:
On the brighter side, Ameobea suggested the very cool looking tectonicdb for L2 tick data:
- example python client
- the socket spec
- L2 book plotting code discussed in detail in this blog post.
- oh, and it's got an offline parser which can be used to read dataframes, yeye!
https://github.com/singularity-data/risingwave