databend
databend copied to clipboard
Support JSON in new expression framework
Tasks
- Add
DataType::Variant
.
The parsing and display rule:
"JSON" => DataType::Variant
"VARIANT" => DataType::Variant
DataType::Variant => "VARIANT"
- Rename generic type
AnyType
toVariantType
, and implementArgType
for it:
ArgType::Scalar = common_expression::values::Scalar
ArgType::ScalarRef<'a> = common_expression::values::ScalarRef<'a>
ArgType::Column = Arc<[Scalar]>
ArgType::Domain = () // upcasts to `Domain::Undefined`
ArgType::ColumnBuilder = Vec<Scalar>
ArgType::ColumnIterator<'a> = std::vec::Iter<'a, Scalar>
-
Add conversion between
Scalar
andserde_json::Value
. And then storeVariant
columns in arrow2'sBinaryArray<i64>
which stores serialized JSON strings. -
Add
can_auto_cast_to
rules:
DataType::_ -> DataType::Variant
- Add CAST rule:
DataType::_ -> DataType::Variant
and TRY_CAST rule:
// Must be successful
DataType::_ -> DataType::Nullable(DataType::Variant)
// Extract exact variant from JSON value, convert to NULL if type mismatches.
DataType::Variant -> DataType::Nullable(_)
@andylokandy hi, I am here again. I will take this issue
json is not like number type, its structure and expression is quit complicated, could you provide more information about this issue, like how to display it, the relationship with string type, does arrow support it , etc.
@andylokandy
@b41sh is the original author of the JSON type, ping for more help :)
@jiaoew1991 Glad to see you again! I've added some instructions to the issue, please let me know if anything is not clear. 😉
some advice @andylokandy @jiaoew1991
It's better to use DataType::Variant
and DataType::VariantObject
, with JSON
and Object
as an alias, because Variant
is more general.
We need to define a struct VariantValue
, as a wrapper serde_json::Value
, for two reasons:
-
serde_json::Value
does not implement the traitOrd
andPartialOrd
, which is needed when sorting. - the performance of
serde_json::Value
is not good, we may replace it with other formats in the future.
I think we can just implement Variant
, aka. Json
, for now. Because I'm going to implement a static typed Map<T>
so that JsonObject
is semantically equal to Map<Variant>
.
@b41sh Can we replace serde_json::Value
with common_expression::values::Scalar
?
I think we can just implement
Variant
, aka.Json
, for now. Because I'm going to implement a static typedMap<T>
so thatJsonObject
is semantically equal toMap<Variant>
.
I agree with you.
@b41sh Can we replace
serde_json::Value
withcommon_expression::values::Scalar
?
common_expression::values::Scalar
can indeed represent any type of data, it is a superset of serde_json::Value
. But in this case, we need to add additional methods to parse the raw JSON text to Scalar
, and also encode Scalar
into a format suitable to store in arrow column
. This would make the Scalar
too complex, so I think it's more appropriate to define a different data type.
FYI: Static typed map is added in https://github.com/datafuselabs/databend/pull/6838
@b41sh Scalar
can be able to convert between serde_json::Value
so that the serialization/deserialization will not be a big problem.
@b41sh @jiaoew1991 I've updated the tasks, PTAL
@b41sh @jiaoew1991 I've updated the tasks, PTAL
@andylokandy Got it, it looks more concise and clear
Hmmm, seems that AnyType
can not be reused because they have different column types. We may have to add a new VariantType