piker icon indicating copy to clipboard operation
piker copied to clipboard

Choosing a time series database

Open goodboy opened this issue 5 years ago • 4 comments

Did some decent starter research on what systems will work for production grade data feed collaboration.


I'm going to list some top picks and what I deem to be pros/cons; comes with a check-mark if we have an existing integration.


on the suit fronts

  • [ ] #495

    • general roadmap dash: https://roadmap.databento.com/roadmap
    • docs: https://docs.databento.com/
      • symbology API (#467!): https://docs.databento.com/api-reference-historical/symbology
        • for futes: https://roadmap.databento.com/b/n0o5prm6/feature-ideas/smart-symbology-lead-month-contracts
        • WIP for opts: https://roadmap.databento.com/b/n0o5prm6/feature-ideas/smart-symbology-for-options
        • treasuries data: https://roadmap.databento.com/b/n0o5prm6/feature-ideas/cash-treasuries-data
      • [ ] live API clients and avail src codez: https://github.com/databento/
        • python client: https://github.com/databento/databento-python
          • docs: https://docs.databento.com/getting-started?historical=python&live=python
          • api docs: https://docs.databento.com/api-reference-live
        • rust backend lib: https://github.com/databento/dbn
          • already opened an issue asking about running you're own: https://github.com/databento/databento-python/issues/9
  • [ ] the new arcticdb, a replacement for the prior arctic project (see below)

    • repo: https://github.com/man-group/ArcticDB
    • Feb 2023 update about the release of the C++ rewrite in collab with bloomy: https://www.man.com/man-group-brings-powerful-dataframe-database-product-arcticdb-to-market-with-bloomberg
    • [ ] need to figure out if we can store np.ndarrays as well or potentially write a polars adapter?
  • [ ] the original arctic by the "man group" which allows serializing dfs and np arrays.

    • seems to have competed more directly with marketstore but is built on mongodb
    • has a tick storage format using pd.DataFrame: https://arctic.readthedocs.io/en/latest/tickstore/
      • talk on storing 10^12 rows of timeseries: https://vimeo.com/showcase/3660528/video/145842301
    • cryptofeed backend impl: https://github.com/bmoscon/cryptofeed/blob/6cf185de641cd63b180f2a34cf52c0773e5961a3/cryptofeed/backends/arctic.py
  • [ ] QuestDb which seems to claim QuestDB is the fastest open source time series database

    • but it's java...
    • repo: https://github.com/questdb/questdb
    • cryptofeed backend impl: https://github.com/bmoscon/cryptofeed/blob/6cf185de641cd63b180f2a34cf52c0773e5961a3/cryptofeed/backends/quest.py
  • [ ] greptimedb: https://github.com/GreptimeTeam/greptimedb

    An open-source, cloud-native, distributed time-series database with PromQL/SQL/Python supported.

    • written in rust

legacy-ish FOSS projects that probably aren't specialized enough..


Additional Resources


Thots

  • One of the things I want to keep in mind is supporting apache arrow formatted data for streaming. The arrow project's new flight system seems to be the way things are going.

goodboy avatar May 16 '20 16:05 goodboy

Probably worth reading up on some professional grade systems:

goodboy avatar May 18 '20 22:05 goodboy

Did a little more research on timescaledb and found this reddit thread which criticized its performance :crying_cat_face:

On the brighter side, Ameobea suggested the very cool looking tectonicdb for L2 tick data:

goodboy avatar May 25 '20 21:05 goodboy

Noticing a category of "IoT" capture dbs:

goodboy avatar Jun 10 '20 12:06 goodboy

https://github.com/singularity-data/risingwave

goodboy avatar May 26 '22 17:05 goodboy