duckdb-wasm icon indicating copy to clipboard operation
duckdb-wasm copied to clipboard

INTERVAL bug (wrong endian?)

Open david542542 opened this issue 1 year ago • 30 comments

What happens?

Running SELECT INTERVAL '2 year' (or really any other INTERVAL) produces the wrong output in the wasm shell. It seems like it has to do with endianness: 2 years is 24 months and here 24 is put into fractional seconds, i.e at the other end of the number.

image

To Reproduce

You can run this query here (click to open in wasm shell).

SELECT INTERVAL '2 day'

OS:

Mac > Chrome > DuckDB Wasm Shell

DuckDB Version:

v0.10.1

DuckDB Client:

Wasm

Full Name:

David Litwin

Affiliation:

None

Have you tried this on the latest nightly build?

I have tested with a nightly build

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • [X] Yes, I have

david542542 avatar Apr 09 '24 16:04 david542542

@Mytherin I think this is actually an issue with DuckDB core's arrow layer, as I see the same issue with duckdb-rs. I did briefly discuss this with @Maxxen as well

Mause avatar Apr 09 '24 17:04 Mause

Why does the problem then not occur in Python?

>>> import duckdb
>>> duckdb.sql("SELECT INTERVAL '2 year'").arrow()
# pyarrow.Table
# CAST('2 year' AS INTERVAL): month_day_nano_interval
# ----
# CAST('2 year' AS INTERVAL): [[24M0d0ns]]

Mytherin avatar Apr 09 '24 18:04 Mytherin

Actually since both duckdb-rs and duckdb-wasm use the Rust arrow library, is it perhaps possible the problem is in the Rust Arrow library?

Mytherin avatar Apr 09 '24 18:04 Mytherin

That's possible yeah, I'd assumed that wasm used arrow-js, but it does seem to use arrow-rs yes

Max commented in Discord:

Sounds like a lifetime issue with the arrow return values then, might be some arrow array destructor we set too early or something. Will have a look

Mause avatar Apr 09 '24 18:04 Mause

Looks like they did have issues here at one point too: https://github.com/apache/arrow-rs/pull/2235/files

Mause avatar Apr 09 '24 18:04 Mause

While possible a lifetime issue seems unlikely to me here - the problem here seems to be that 24 months are interpreted as 24 seconds, and the issue appears to be consistent.

Mytherin avatar Apr 09 '24 18:04 Mytherin

Seems like it is wrong for all time units. See here.

image

david542542 avatar Apr 09 '24 18:04 david542542

Looks like wasm uses a very old version of arrow-rs. @carlopi perhaps we can try to upgrade this?

Mytherin avatar Apr 09 '24 18:04 Mytherin

Using arrow-js (basically when not using the shell infrastructure but passing arrow results directly to JS) the results are correctly parsed generated.

I will look at bumping arrow-rs.

carlopi avatar Apr 09 '24 18:04 carlopi

Sorry, I said I was going to look into this but then went on vacation and been distracted with other things to get in before the release since I got back. Ill investigate more tomorrow.

Maxxen avatar Apr 09 '24 20:04 Maxxen

This occurs in the rust client as well, even with latest arrow.

Maxxen avatar Apr 10 '24 07:04 Maxxen

My guess at this point is that this is a bug within arrow-rs related to ingestion of intervals through the Arrow C API. Perhaps we can try to create a minimal example and file an issue in their repo?

Mytherin avatar Apr 10 '24 07:04 Mytherin

If @Maxxen would be up to it, I think given duckdb-rs is easier to test in isolation (also for the arrow people) and it's simpler to trust since there are less layers of things going wrong wrt duckdb-wasm's shell, I think that would work better as reproducer (and it's not on me!).

carlopi avatar Apr 10 '24 07:04 carlopi

Actually, I think we might be in the wrong here. Arrow stores the interval in a 128bit integer, interpreted as multiple fields;

┌──────────────────────────────┬─────────────┬──────────────┐
│            Nanos             │    Days     │    Months    │
│          (64 bits)           │ (32 bits)   │  (32 bits)   │
└──────────────────────────────┴─────────────┴──────────────┘
  0                            63            95           127 bit offset

But our ArrowInterval type looks like this:

struct ArrowInterval {
	int32_t months;
	int32_t days;
	int64_t nanoseconds;
}

Which we just reinterpret straight up when writing out the arrow buffer.

  68                                   continue;
  69                           }
  70                           result_data[result_idx] = OP::template Operation<TGT, SRC>(data[source_idx]);
  71                   }
  72                   append_data.row_count += size;
  73           }
  74   };
(lldb) p result_data[result_idx]
(duckdb::ArrowInterval)  (months = 24, days = 0, nanoseconds = 0)

But thats not the right order when reinterpreting as a i128 (which is what rust is doing on the other side)

(lldb) p *(__int128*)&result_data[result_idx]
(__int128) 24
    pub fn to_parts(
        i: <IntervalMonthDayNanoType as ArrowPrimitiveType>::Native,
    ) -> (i32, i32, i64) {
        let months = (i >> 96) as i32;
        let days = (i >> 64) as i32;
        let nanos = i as i64;
        (months, days, nanos)
    }

Im not sure why this works in python though...

Maxxen avatar Apr 10 '24 08:04 Maxxen

The rust code looks wrong to me. The definition is in little-endian. i >> 96 means we take the most significant bits for the month, when it should be the least-significant bits. Here are some other definitions in the official Arrow project:

It also doesn't really make sense for the Rust driver to be correct when it is the one that disagrees with everything else. Either all DuckDB, Arrow-C++, Arrow-Python, Arrow-Go, Arrow-R and Arrow-JS are wrong, or Arrow-RS is wrong. The latter seems more logical to me.

Mytherin avatar Apr 10 '24 09:04 Mytherin

Are you saying it was correct before this change? https://github.com/apache/arrow-rs/pull/2235

Mause avatar Apr 10 '24 09:04 Mause

~~Where's this diagram from @Maxxen ?~~

┌──────────────────────────────┬────────────┬──────────────┐
│            Nanos             │    Days     │    Months    │
│          (64 bits)           │ (32 bits)   │  (32 bits)   │
└──────────────────────────────┴────────────┴──────────────┘
  0                            63            95           127 bit offset

EDIT: Disregard, it's the same as https://github.com/apache/arrow-rs/pull/2235/files#diff-c47b2e57ea72cdad6427c3149de134be3437e403984d367e3604cd0943a9a963R288, just backwards

All integers in the types below are stored in the endianness indicated by the schema.

Mause avatar Apr 10 '24 09:04 Mause

@Mause The diagram is from here: https://github.com/apache/arrow-rs/blob/36a6e515f99866eae5332dfc887c6cb5f8135064/arrow-array/src/types.rs#L265-L269

I agree seems like the rust def is wrong. So what now?

Maxxen avatar Apr 10 '24 10:04 Maxxen

I would say we should double-check that the rust version is wrong and try to create a reproducer separate from DuckDB, then file a bug report in the arrow-rs repository.

Mytherin avatar Apr 10 '24 11:04 Mytherin

Here's a stand-alone C++ program showcasing the problem:

#include <stdint.h>
#include <stdio.h>

using uint128_t = __uint128_t;

struct MonthDayNanos {
    int32_t months;
    int32_t days;
    int64_t nanoseconds;
};

int main() {
	MonthDayNanos ival;
	ival.months = 32;
	ival.days = 64;
	ival.nanoseconds = 128;

	uint128_t u128val = *reinterpret_cast<uint128_t *>(&ival);

	printf("Rust code (incorrect)\n");
	printf("months %u\n", uint32_t(u128val >> 96));
	printf("days %u\n", uint32_t(u128val >> 64));
	printf("nanos %llu\n", uint64_t(u128val));

	printf("\n");

	printf("Correct shifts\n");
	printf("months %u\n", uint32_t(u128val));
	printf("days %u\n", uint32_t(u128val >> 32));
	printf("nanos %llu\n", uint64_t(u128val >> 64));
}

Prints:

Rust code (incorrect)
months 0
days 128
nanos 274877906976

Correct shifts
months 32
days 64
nanos 128

Mytherin avatar Apr 10 '24 11:04 Mytherin

Hi, is there any update on this? Should I file a bug report in arrow-rs, or what is necessary in getting this resolved?

david542542 avatar Apr 16 '24 19:04 david542542

Arrow C++ has https://github.com/apache/arrow/blob/7003e90c113cec58a620deddf56af71eb305af2a/cpp/src/arrow/type.h#L1775-L1784

Arrow Java uses the same layout: https://github.com/apache/arrow/blob/7003e90c113cec58a620deddf56af71eb305af2a/java/vector/src/main/java/org/apache/arrow/vector/IntervalMonthDayNanoVector.java#L171-L181

pitrou avatar Apr 17 '24 09:04 pitrou

Arrow C++ has https://github.com/apache/arrow/blob/7003e90c113cec58a620deddf56af71eb305af2a/cpp/src/arrow/type.h#L1775-L1784

Arrow Java uses the same layout: https://github.com/apache/arrow/blob/7003e90c113cec58a620deddf56af71eb305af2a/java/vector/src/main/java/org/apache/arrow/vector/IntervalMonthDayNanoVector.java#L171-L181

That's mentioned above, the Arrow C++ layout is:

struct MonthDayNanos {
    int32_t months;
    int32_t days;
    int64_t nanoseconds;
};

The Rust code for parsing that as a uint128_t is as follows:

    pub fn to_parts(
        i: <IntervalMonthDayNanoType as ArrowPrimitiveType>::Native,
    ) -> (i32, i32, i64) {
        let months = (i >> 96) as i32;
        let days = (i >> 64) as i32;
        let nanos = i as i64;
        (months, days, nanos)
    }

As shown in my stand-alone program here - https://github.com/duckdb/duckdb-wasm/issues/1696#issuecomment-2047272977 - those shifts produce the wrong results given the Arrow C++ layout.

Mytherin avatar Apr 17 '24 10:04 Mytherin

I noticed the integration tests linked involve files - perhaps the byte order for Arrow files is different from the in-memory byte order (big-endian vs little-endian)?

Mytherin avatar Apr 17 '24 10:04 Mytherin

Yes, sorry, I was just posting this for reference.

pitrou avatar Apr 17 '24 10:04 pitrou

I noticed the integration tests linked involve files - perhaps the byte order for Arrow files is different from the in-memory byte order (big-endian vs little-endian)?

No, it's probably just that arrow-rs has the same misinterpretation bug in all code paths. So when you ask it to roundtrip the data from JSON to IPC, for example, it will produce the intended IPC results.

pitrou avatar Apr 17 '24 10:04 pitrou

Makes sense, thanks for picking this up!

Mytherin avatar Apr 17 '24 10:04 Mytherin

Well, you'll have to thank @tustvold for that :-)

pitrou avatar Apr 17 '24 10:04 pitrou

@Mytherin brief update, about ~two weeks away: https://github.com/apache/arrow-rs/issues/5654#issuecomment-2108732883

david542542 avatar May 13 '24 20:05 david542542

Thanks for driving this @david542542!

Maxxen avatar May 13 '24 20:05 Maxxen