connect
connect copied to clipboard
Potential bug regarding bloblang & unix timestamp methods
I believe I may have found a bug in regards to the timestamp methods within bloblang in regards to processing unix timestamps; or, at the very least, some misleading language in the documentation.
To quote the documentation for ts_unix
from here:
Attempts to format a timestamp value as a unix timestamp. Timestamp values can either be a numerical unix time in seconds (with up to nanosecond precision via decimals), or a string in RFC 3339 format. The ts_parse method can be used in order to parse different timestamp formats.
The documentation states that 'up to nanosecond precision via decimals' is supported, however, in my tests, it seems that regardless of whether converting to or from a unix timestamp, sub-second / fractional seconds are discarded. There is similar language in regards to formatted timestamps, but my tests show that they correctly preserve fractional seconds.
I've also found a workaround (included in attached test case) by using ts_unix_nano
and then dividing by 1e9 to recover the fractional portion.
diagnostic info:
$ benthos --version Version: 4.3.0 Date: 2022-06-23T18:32:27Z
Installed via homebrew, running on OSX 10.15.7.
Below is a minimal test case to reproduce:
# Test case for fractional timestamps with unix timestamps
pipeline:
processors:
- bloblang: |
# Test case for fractional timestamps
root.ts_unix2unix = this.ts_unix.ts_unix()
root.ts_unix2fmt = this.ts_unix.ts_format(tz: "UTC")
root.ts_fmt2unix = this.ts_fmt.ts_unix()
root.ts_fmt2fmt = this.ts_fmt.ts_format(tz: "UTC")
# A potential workaround
root.workaround = this.ts_unix.ts_unix_nano() / 1000000000
tests:
- name: simple
target_processors: '/pipeline/processors'
input_batch:
- json_content:
# Both values represent the same time
ts_fmt: 2022-07-13T17:40:18.500Z
ts_unix: 1657734018.5
output_batches:
-
- json_equals:
ts_unix2unix: 1657734018.5
ts_unix2fmt: 2022-07-13T17:40:18.500Z
ts_fmt2unix: 1657734018.5
ts_fmt2fmt: 2022-07-13T17:40:18.500Z
workaround: 1657734018.5
And the output:
$ benthos test ts_fractional_test.yaml
Test 'ts_fractional_test.yaml' failed
Failures:
--- ts_fractional_test.yaml ---
simple [line 16]:
batch 0 message 0: json_equals: JSON content mismatch
{
"ts_fmt2fmt": "2022-07-13T17:40:18.5Z",
"ts_fmt2unix": 1657734018 => 1657734018.5,
"ts_unix2fmt": "2022-07-13T17:40:18.5Z",
"ts_unix2unix": 1657734018 => 1657734018.5,
"workaround": 1657734018.5
}
Summary -- two timestamps representing the same value, one in RFC3339 format and the other as a unix timestamp with fractional seconds, are each run through both ts_unix()
and ts_format(tz: "UTC")
. While the output of ts_format(tz: "UTC")
does preserve the fractional seconds from the unix timestamp correctly, the ts_unix()
method discards fractional seconds from the input unix timestamp and the RFC3339 formatted timestamp. The proposed workaround functions as expected, however the magnitude of the numbers involved in this context makes me concerned about potential floating point creep / error.
Hey @DeusFacticius, this is a case of misleading and confusing documentation. The blurb of:
Timestamp values can either be a numerical unix time in seconds (with up to nanosecond precision via decimals), or a string in RFC 3339 format.
Is repeated for every timestamp related method or function, and is trying to convey the idea that what is considered a valid timestamp within Bloblang (for methods such as ts_format
) is any of the following:
- A numerical value representing unix time, which may or may not include nanosecond precision as decimal values
- An RFC 3339 string value
- A timestamp literal returned from
now()
,ts_parse
, etc
The timestamp_unix()
function and .ts_unix()
method itself intentionally only returns an integer value as it would be unexpected in many usecases to have decimal values there. This is considered a valid timestamp within Bloblang but you're right that the docs could easily mislead you into thinking you'll get nanosecond precision.
I'll update the docs but we could also consider adding and optional parameter of the form .ts_unix(nano_precision: true)
, that would save you the hassle of dividing the result of ts_unix_nano
.