influxdb-cxx Respect UDP maximum packet size when sending batched points

This closes #184.

Adds a new mandatory member function of the Transport base class called getMaxMessageSize. This function must returns the size of the maximum message which may be transmitted using the Transport implementation's send method.

For the UDP transport this is set to the minimum of the UDP socket's send buffer (for MacOS) or the maximum data length which is practical in the UDP protocol after header sizes are taken into account (65507). N.B. This may cause packet fragmentation at the IP layer but since UDP makes no delivery guarantees anyway I think this is an acceptable compromise when the user has chosen to batch write points over UDP.
For the other transports (HTTP, TCP, UnixSocket) this is to effectively unlimited (std::numeric_limits<std::size_t>::max()) to preserve existing behaviour.

The InfluxDB::flushBatch method is updated to construct the largest possible messages which can be successfully sent given the mTransport in use.

If any individual Point is too large to be successfully sent (i.e.: its Line Protocol representation is larger than the maximum message size) it is skipped and the other Points in the batch are sent. An exception is subsequently raised.

Mar 22 '23 15:03 simonaldrich

Thanks for your PR!

I have some comments, but my (personal) overall opinion is, that this adds quite some complexity to everyone just to handle a single (less used?) transport. Instead, I'm tempted to consider deprecating (and finally removing) UDP support, even more as v2 wont support it anymore (questdb did too).

Mar 22 '23 19:03 offa

Thanks for your PR!

You're welcome. Thank you for the review comments (which I'm happy to address) and the opportunity to discuss the best approach to this :slightly_smiling_face:

this adds quite some complexity to everyone just to handle a single (less used?) transport

You're absolutely right, this definitely adds complexity to the transmission of batched (or vectors of) points. As a slight mitigation it does consolidate two code paths (sending unbatched point vectors and flushing batched points) into a single function. Perhaps we can find middle ground which avoids the complexity for Transports with no transmission limits?

Instead, I'm tempted to consider deprecating (and finally removing) UDP support

You'd obviously be completely within your rights to do so. Unfortunately, that would be problematic from my perspective. To explain why, it might be helpful to understand the use-case behind the PR.

My team are replacing a legacy system which sends large quantities of data to another system's v1 InfluxDB using UDP. We want to perform as few writes as possible (hence the batching) so that Influx is likely to flush related batches of points to the measurements together. This mirrors the behaviour of the system being replaced.

During integration testing we noticed the issue with batched points exceeding the UDP packet size limit which would cause data loss as far as the other system was concerned.

We don't have the luxury of replacing the receiving system and I don't think this situation will be unique. I would suggest that a combination of inertia and aversion to change most likely means that Influx v1 and the UDP transport will be around for quite some time especially in embedded and industrial markets.

It would be great if we can maintain UDP support, I think it may be more widely used than you might think :slightly_smiling_face:

Mar 22 '23 21:03 simonaldrich

You'd obviously be completely within your rights to do so. Unfortunately, that would be problematic from my perspective. To explain why, it might be helpful to understand the use-case behind the PR.

Thanks for the insight, it helps a lot to understand the situation.

Perhaps we can find middle ground which avoids the complexity for Transports with no transmission limits?

An ideal solution would be to stay within UDP transport implementation. :+1: But yes, lets find a good solution for both sides here :-)

You're welcome. Thank you for the review comments (which I'm happy to address) and the opportunity to discuss the best approach to this

:+1:

Mar 23 '23 19:03 offa

There are two little things that are worth merging independently as they provide profit already: Using uint16 ports (TCP / UDP ctor) and the resolve() related changes (= the try-catch part of UDP ctor).

If you have some minutes spare, could you submit those as a separate PR?

Mar 23 '23 19:03 offa

Some thoughts (not necessarily good though):

Implementing an additional batching strategy beside by-count
Implementing send as strategy in general (eg. no batching, by-count, by-size)
Since it's a line protocol – terminated by a newline – and tags / fields don't support \n; send as many lines as fit into in a packet, then cut and continue with next packet (simple, stupid, but does it work?)

Mar 23 '23 21:03 offa

Thanks for the feedback, very much appreciated. I will get around to doing the separate PRs over the next few days :slightly_smiling_face:

Mar 24 '23 09:03 simonaldrich

Since it's a line protocol – terminated by a newline – and tags / fields don't support \n; send as many lines as fit into in a packet, then cut and continue with next packet (simple, stupid, but does it work?)

While playing around with this idea it turned out to be quite promising :+1:. I'm going to write a basic PoC so we'll see if it's a feasible solution.

Apr 01 '23 17:04 offa

influxdb-cxx influxdb-cxx copied to clipboard

Respect UDP maximum packet size when sending batched points

influxdb-cxx
influxdb-cxx copied to clipboard