duckdb icon indicating copy to clipboard operation
duckdb copied to clipboard

Vector::SetValue not always casting when it should

Open Tishj opened this issue 3 years ago • 3 comments

What happens?

When determining whether or not to cast, SetValue only considers PhysicalType. This can be problematic in case there is additional type_info present on the LogicalType. (For a comprehensive list of the possible TypeInfo see PR #1851)

An example I came across is DECIMAL ignoring the width/scale, producing wrong results by not casting if the width/scale differ. I have attempted to fix the issue by always comparing the LogicalTypes using the equality operator but this caused enough issues to demand a separate PR.
So for now there is only a simple band aid fix targeting DECIMAL.

To Reproduce

There's not really a reproducible example to this issue, it's more a potential problem that should be addressed. Since this has caused issues with DECIMAL, there's a high probability other issues could arise from this.

Notes

When attempting to patch this potential issue I have observed two (semi) big hurdles.

  1. Value::CastAs(LogicalType& target_type) is not inheriting collation when target_type is VARCHAR, which caused this to infinite recurse:
	if (val.type() != GetType()) {
		SetValue(index, val.CastAs(GetType()));
		return;
	}
  1. When not just comparing using PhysicalType but instead comparing using LogicalTypeId a lot of tests break that relied on there never needing to be an actual Cast performed if the PhysicalType was the same, such as:
Invalid Input Error: Failed to cast value: Unimplemented type for cast (BIGINT -> TIMESTAMP)

The second issue is only relevant if there is a difference in the way that data should be represented between two LogicalType's that share the same PhysicalType. To which I unfortunately don't know the answer yet.

Tishj avatar Jul 14 '22 12:07 Tishj

Issues I was facing are likely caused by missing support for the ALIAS ExtraTypeInfo introduced by PR #3668

Tishj avatar Jul 26 '22 13:07 Tishj

The issue is highlighted by this test:

statement ok
CREATE TYPE alias AS VARCHAR;

statement ok
CREATE TABLE test AS select 'happy'::ALIAS;

When we force a cast, this causes the following assertion to be triggered:

TransactionContext Error: Failed to commit: INTERNAL Error: Assertion triggered in file "duckdb/src/common/types/vector.cpp" on line 94: other.GetType() == GetType()

~~Currently on master, Vector::SetValue does not cast when the physical types are the same, but when we force a cast, this issue arises.~~ despite hitting an assertion, the query completes succesfully on release

Tishj avatar Jul 28 '22 10:07 Tishj

If I change SetValue to always compare on the LogicalType level, this causes an infinite recursion between Vector::SetValue and Vector::CastAs

statement ok
CREATE TYPE alias AS VARCHAR;

statement ok
CREATE TABLE person (
    current_alias alias
);

statement ok
INSERT INTO person VALUES ('happy');

Edit: This is caused by GetValue, it does not copy the extra type info from the Vector, but is used in the comparison - so they are never equal

Tishj avatar Jul 28 '22 11:07 Tishj