Akumuli icon indicating copy to clipboard operation
Akumuli copied to clipboard

feature: Group points by series in TCP Write API

Open at15 opened this issue 6 years ago • 3 comments

The TCP write API https://github.com/akumuli/Akumuli/wiki/Writing-data-using-the-TCP-API only supports write single point at a time, bulk loading is supported on series name instead of points. It's good for streaming data, but what if the client itself want to do some buffering or loading data from other store.

Support group points by series has the following advantage

  • save the bandwidth by reducing transmitted series meta
  • the server side operation may become more efficient because data come in batch

i.e. KairosDB and alike can put multiple points in a series

[
  {
      "name": "archive_file_tracked",
      "datapoints": [[1359788400000, 123], [1359788300000, 13.2], [1359788410000, 23.1]],
      "tags": {
          "host": "server1",
          "data_center": "DC1"
      },
      "ttl": 300
  },
]

combine it with current protocol we might have something like the following, where 3 is to specify how many points this series has

+cpu.sys host=machine1 region=NW
*3
+20141210T074343
+3.12
+20141210T074344
+8.11
+20141210T074345
+12.6

Also is there any plan to add binary protocol, pure text protocol is easy for human, but for machine, a timestamp in string just add the burden in both client and server for marshal and unmarshal.

at15 avatar Mar 17 '18 19:03 at15

Well, this is actually a good idea. I've already implemented one batch format but this one wouldn't interfere with it. I'm not planning to add a binary protocol because it wouldn't help much. The series names will be sent as text anyway. The most complex part is series name parsing, this process takes a lot of time. I've planned to add protocol level compression using lz4. It's easy to implement on the client and it will make input smaller.

Lazin avatar Mar 17 '18 23:03 Lazin

Another way to address overhead of series name is to add an extra prepare phase, so client send id instead of text, this works if the client seldom changes its series during one connection. Most relational databases have prepared statement, but I haven't seen any in TSDB yet. A draw back of this is if the database sits behind a load balancer, sticky session is needed to avoid sending prepared data to another database

i.e. (pseudo code, not redis protocol)

client prepare cpu.sys host=machine1 region=NW
server prepared 1
client 
s:1
+20141210T074343
+3.12

at15 avatar Mar 17 '18 23:03 at15

I'd prefer to do everything in one TCP session. Without an extra roundtrip to the server for every name. To achieve this, the client may send a dictionary in the first frame, something like this:

*2
+cpu.sys host=machine1 region=NW
:1
*2
+cpu.sys host=machine2 region=NW
:2

And the actual messages:

:1
+20180318T112000
+3.1415
:2
+20180318T112000
+88

The client can cache this dictionary per TCP-session and use it to avoid series name parsing.

On Sun, Mar 18, 2018 at 2:18 AM Pinglei Guo [email protected] wrote:

Another way to address overhead of series name is to add an extra prepare phase, so client send id instead of text, this works if the client seldom changes its series during one connection. Relational database all have prepared statement, but I haven't seen any in TSDB yet. A draw back of this if it the database sits behind a load balancer, sticky session is needed to avoid sending prepared data to another database

i.e. (pseudo code, not redis protocol)

client prepare cpu.sys host=machine1 region=NW server prepared 1 client s:1 +20141210T074343 +3.12

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/akumuli/Akumuli/issues/260#issuecomment-373960063, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFmNtQ77LzdFpGJ74tcQK7zgLObXYydks5tfZm5gaJpZM4Su8S4 .

-- Cheers, Evgeny

Lazin avatar Mar 18 '18 08:03 Lazin