velox
                                
                                 velox copied to clipboard
                                
                                    velox copied to clipboard
                            
                            
                            
                        Fuzzer generated TimestampWithTimezone vector should not have nulls at only one field of a row
TimestampWithTimezone is implemented as a Row<int64_t, int16_t> in Velox. But TimestampWithTimezone is primitive in Presto, meaning that each of its value should be either null or non-null at both field. VectorFuzzer generates random TimestampWithTimezone vectors as regular RowVectors and sometimes put nulls into only one field of a row. This causes expression fuzzer test failures when evaluating datetime functions because those functions do not handle null at only one field of a TimestampWithTimezone value.
The design of TimestampWithTimezone is also discussed here: https://github.com/facebookincubator/velox/discussions/2511.
Hello!
Familiarizing with the topic: Will a solution like
- Add TimestampWithTimezone specific handling by using https://github.com/facebookincubator/velox/blob/e20cb66b1b85d6c5e3a694d998c8d7ad3dbcef2b/velox/functions/prestosql/types/TimestampWithTimeZoneType.h#L43 at https://github.com/facebookincubator/velox/blob/45b5b5a49808fd0430b31ad845e6814a2c9742d7/velox/vector/fuzzer/VectorFuzzer.cpp#L340 (and all other relevant fuzz entry points)
- Prevent partial nullability generation in the new block
work?
Looking at entry points(assuming we TimestampWithTimezone is still Row) there are multiple customization places
- From https://github.com/facebookincubator/velox/blob/45b5b5a49808fd0430b31ad845e6814a2c9742d7/velox/vector/fuzzer/VectorFuzzer.cpp#L290
- https://github.com/facebookincubator/velox/blob/45b5b5a49808fd0430b31ad845e6814a2c9742d7/velox/vector/fuzzer/VectorFuzzer.cpp#L321
- https://github.com/facebookincubator/velox/blob/45b5b5a49808fd0430b31ad845e6814a2c9742d7/velox/vector/fuzzer/VectorFuzzer.cpp#L248
It seems like fuzzConstant will go to fuzz and from there to https://github.com/facebookincubator/velox/blob/45b5b5a49808fd0430b31ad845e6814a2c9742d7/velox/vector/fuzzer/VectorFuzzer.cpp#L380
Actually two questions:
- Is this condition valid: https://github.com/facebookincubator/velox/blob/45b5b5a49808fd0430b31ad845e6814a2c9742d7/velox/vector/fuzzer/VectorFuzzer.cpp#L299 ?
- Is there a cyclic dependency fuzz -> fuzzConstant -> fuzz?
Hi @gosharz , the plan in your first comment makes sense. Checking isTimestampWithTimeZoneType() and in that case generating a single nullability vector and using it for both children should work. There might be a couple of other places where you'll have to extend, like you pointed out in the second comment.
Regarding the last one:
#1. Yes. Unknown is usually used to represented a null value where we don't know the type. So generating a null when the type is Unknown sounds reasonable.
#2. I would call it "recursion" :) but yes.
Perhaps, a more generic solution would be to add support for custom types to Fuzzer. The Fuzzer could provide an API to register callbacks for generating flat vectors of a custom type with a given seed and null ratio. This would allow us to apply Fuzzer to built-in types like PrestoSQL types as well as types specific to different applications built on top of Velox.
For reference, type registry APIs can be found in velox/type/Type.h
/// Adds custom type to the registry. Type names must be unique.
void registerType(
    const std::string& name,
    std::unique_ptr<const CustomTypeFactories> factories);
/// Return true if customer type with specified name exists.
bool typeExists(const std::string& name);
/// Returns an instance of a custom type with the specified name and specified
/// child types.
TypePtr getType(const std::string& name, std::vector<TypePtr> childTypes);
Flushing some prestosql scalar functions. Will get back to this shortly. Also thinking about @mbasmanova's proposal.
Hey folks, won't have bandwidth to tackle this one: unassigning for now.