velox icon indicating copy to clipboard operation
velox copied to clipboard

Fuzzer generated TimestampWithTimezone vector should not have nulls at only one field of a row

Open kagamiori opened this issue 3 years ago • 6 comments

TimestampWithTimezone is implemented as a Row<int64_t, int16_t> in Velox. But TimestampWithTimezone is primitive in Presto, meaning that each of its value should be either null or non-null at both field. VectorFuzzer generates random TimestampWithTimezone vectors as regular RowVectors and sometimes put nulls into only one field of a row. This causes expression fuzzer test failures when evaluating datetime functions because those functions do not handle null at only one field of a TimestampWithTimezone value.

The design of TimestampWithTimezone is also discussed here: https://github.com/facebookincubator/velox/discussions/2511.

kagamiori avatar Oct 14 '22 20:10 kagamiori

Hello!

Familiarizing with the topic: Will a solution like

  1. Add TimestampWithTimezone specific handling by using https://github.com/facebookincubator/velox/blob/e20cb66b1b85d6c5e3a694d998c8d7ad3dbcef2b/velox/functions/prestosql/types/TimestampWithTimeZoneType.h#L43 at https://github.com/facebookincubator/velox/blob/45b5b5a49808fd0430b31ad845e6814a2c9742d7/velox/vector/fuzzer/VectorFuzzer.cpp#L340 (and all other relevant fuzz entry points)
  2. Prevent partial nullability generation in the new block

work?

gosharz avatar Oct 16 '22 03:10 gosharz

Looking at entry points(assuming we TimestampWithTimezone is still Row) there are multiple customization places

  1. From https://github.com/facebookincubator/velox/blob/45b5b5a49808fd0430b31ad845e6814a2c9742d7/velox/vector/fuzzer/VectorFuzzer.cpp#L290
  2. https://github.com/facebookincubator/velox/blob/45b5b5a49808fd0430b31ad845e6814a2c9742d7/velox/vector/fuzzer/VectorFuzzer.cpp#L321
  3. https://github.com/facebookincubator/velox/blob/45b5b5a49808fd0430b31ad845e6814a2c9742d7/velox/vector/fuzzer/VectorFuzzer.cpp#L248

It seems like fuzzConstant will go to fuzz and from there to https://github.com/facebookincubator/velox/blob/45b5b5a49808fd0430b31ad845e6814a2c9742d7/velox/vector/fuzzer/VectorFuzzer.cpp#L380

gosharz avatar Oct 16 '22 04:10 gosharz

Actually two questions:

  1. Is this condition valid: https://github.com/facebookincubator/velox/blob/45b5b5a49808fd0430b31ad845e6814a2c9742d7/velox/vector/fuzzer/VectorFuzzer.cpp#L299 ?
  2. Is there a cyclic dependency fuzz -> fuzzConstant -> fuzz?

gosharz avatar Oct 16 '22 04:10 gosharz

Hi @gosharz , the plan in your first comment makes sense. Checking isTimestampWithTimeZoneType() and in that case generating a single nullability vector and using it for both children should work. There might be a couple of other places where you'll have to extend, like you pointed out in the second comment.

Regarding the last one:

#1. Yes. Unknown is usually used to represented a null value where we don't know the type. So generating a null when the type is Unknown sounds reasonable.

#2. I would call it "recursion" :) but yes.

pedroerp avatar Oct 17 '22 16:10 pedroerp

Perhaps, a more generic solution would be to add support for custom types to Fuzzer. The Fuzzer could provide an API to register callbacks for generating flat vectors of a custom type with a given seed and null ratio. This would allow us to apply Fuzzer to built-in types like PrestoSQL types as well as types specific to different applications built on top of Velox.

For reference, type registry APIs can be found in velox/type/Type.h

/// Adds custom type to the registry. Type names must be unique.
void registerType(
    const std::string& name,
    std::unique_ptr<const CustomTypeFactories> factories);

/// Return true if customer type with specified name exists.
bool typeExists(const std::string& name);

/// Returns an instance of a custom type with the specified name and specified
/// child types.
TypePtr getType(const std::string& name, std::vector<TypePtr> childTypes);

mbasmanova avatar Oct 19 '22 14:10 mbasmanova

Flushing some prestosql scalar functions. Will get back to this shortly. Also thinking about @mbasmanova's proposal.

gosharz avatar Oct 20 '22 01:10 gosharz

Hey folks, won't have bandwidth to tackle this one: unassigning for now.

gosharz avatar Nov 01 '22 20:11 gosharz