connect icon indicating copy to clipboard operation
connect copied to clipboard

Potential bug regarding bloblang & unix timestamp methods

Open DeusFacticius opened this issue 2 years ago • 1 comments

I believe I may have found a bug in regards to the timestamp methods within bloblang in regards to processing unix timestamps; or, at the very least, some misleading language in the documentation.

To quote the documentation for ts_unix from here:

Attempts to format a timestamp value as a unix timestamp. Timestamp values can either be a numerical unix time in seconds (with up to nanosecond precision via decimals), or a string in RFC 3339 format. The ts_parse method can be used in order to parse different timestamp formats.

The documentation states that 'up to nanosecond precision via decimals' is supported, however, in my tests, it seems that regardless of whether converting to or from a unix timestamp, sub-second / fractional seconds are discarded. There is similar language in regards to formatted timestamps, but my tests show that they correctly preserve fractional seconds.

I've also found a workaround (included in attached test case) by using ts_unix_nano and then dividing by 1e9 to recover the fractional portion.

diagnostic info:

$ benthos --version Version: 4.3.0 Date: 2022-06-23T18:32:27Z

Installed via homebrew, running on OSX 10.15.7.

Below is a minimal test case to reproduce:

# Test case for fractional timestamps with unix timestamps

pipeline:
  processors:
    - bloblang: |
        # Test case for fractional timestamps
        root.ts_unix2unix = this.ts_unix.ts_unix()
        root.ts_unix2fmt = this.ts_unix.ts_format(tz: "UTC")
        root.ts_fmt2unix = this.ts_fmt.ts_unix()
        root.ts_fmt2fmt = this.ts_fmt.ts_format(tz: "UTC")

        # A potential workaround
        root.workaround = this.ts_unix.ts_unix_nano() / 1000000000

tests:
  - name: simple
    target_processors: '/pipeline/processors'
    input_batch:
      - json_content:
          # Both values represent the same time
          ts_fmt: 2022-07-13T17:40:18.500Z
          ts_unix: 1657734018.5
    output_batches:
      -
        - json_equals:
            ts_unix2unix: 1657734018.5
            ts_unix2fmt: 2022-07-13T17:40:18.500Z
            ts_fmt2unix: 1657734018.5
            ts_fmt2fmt: 2022-07-13T17:40:18.500Z
            workaround: 1657734018.5

And the output:

$ benthos test ts_fractional_test.yaml 
Test 'ts_fractional_test.yaml' failed

Failures:

--- ts_fractional_test.yaml ---

simple [line 16]:
batch 0 message 0: json_equals: JSON content mismatch
{
    "ts_fmt2fmt": "2022-07-13T17:40:18.5Z",
    "ts_fmt2unix": 1657734018 => 1657734018.5,
    "ts_unix2fmt": "2022-07-13T17:40:18.5Z",
    "ts_unix2unix": 1657734018 => 1657734018.5,
    "workaround": 1657734018.5
}

Summary -- two timestamps representing the same value, one in RFC3339 format and the other as a unix timestamp with fractional seconds, are each run through both ts_unix() and ts_format(tz: "UTC"). While the output of ts_format(tz: "UTC") does preserve the fractional seconds from the unix timestamp correctly, the ts_unix() method discards fractional seconds from the input unix timestamp and the RFC3339 formatted timestamp. The proposed workaround functions as expected, however the magnitude of the numbers involved in this context makes me concerned about potential floating point creep / error.

DeusFacticius avatar Jul 14 '22 22:07 DeusFacticius

Hey @DeusFacticius, this is a case of misleading and confusing documentation. The blurb of:

Timestamp values can either be a numerical unix time in seconds (with up to nanosecond precision via decimals), or a string in RFC 3339 format.

Is repeated for every timestamp related method or function, and is trying to convey the idea that what is considered a valid timestamp within Bloblang (for methods such as ts_format) is any of the following:

  • A numerical value representing unix time, which may or may not include nanosecond precision as decimal values
  • An RFC 3339 string value
  • A timestamp literal returned from now(), ts_parse, etc

The timestamp_unix() function and .ts_unix() method itself intentionally only returns an integer value as it would be unexpected in many usecases to have decimal values there. This is considered a valid timestamp within Bloblang but you're right that the docs could easily mislead you into thinking you'll get nanosecond precision.

I'll update the docs but we could also consider adding and optional parameter of the form .ts_unix(nano_precision: true), that would save you the hassle of dividing the result of ts_unix_nano.

Jeffail avatar Jul 15 '22 11:07 Jeffail