Receive ARRAY, VARIANT, OBJECT datatypes in Arrow format
Please answer these questions before submitting your pull requests. Thanks!
-
What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
-
Fill out the following pre-review checklist:
- [x] I am adding a new automated test(s) to verify correctness of my new code
- [ ] I am adding new logging messages
- [ ] I am adding a new telemetry message
- [ ] I am modifying authorization mechanisms
- [ ] I am adding new credentials
- [ ] I am modifying OCSP code
- [ ] I am adding a new dependency
-
Please describe how your code solves the related issue.
Extending the ArrowIterator cpp files with a new VariantConverter object that relies on the previously created SnowflakeJSONDecoder class to handle undefined values.
@sfc-gh-heshah Any chance this PR is still viable? It would be really great to properly support arrays (including undefined values).
@sfc-gh-heshah Any idea if support for nested data with arrow will ever be added?
VARIANT data can also be for example Avro – not just JSON.
The code presented here seems to targetting VARIANT data which is JSON-encoded; or is Snowflake implicitly converting other semi-structured formats to JSON? I wouldn't think so.
The code presented here seems to targetting VARIANT data which is JSON-encoded; or is Snowflake implicitly converting other semi-structured formats to JSON? I wouldn't think so.
I'm not sure I follow. Snowflake encodes things in its internal format and doesn't preserve the source format.
@cpcloud in this changeset, I read code such as:
def _VARIANT_to_python(self, _) -> Callable[[str], Any]:
"""Use a custom json decoder (accepts undefined) to load row value"""
return partial(json.loads, cls=SnowflakeJSONDecoder)
Like, we see a VARIANT value, let's decode it using JSON. But is that the only "wire format" we can get? I'm thinking both in terms of performance but also fidelity – JSON doesn't have the same data type richness as the other supported semi-structured formats.
The fact that the arrow interop code here is pulling out a string and then immediately decoding it as JSON suggests that variants are always coming back as JSON.
I'd love to be wrong about this, other wire formats would be most welcome from the ibis perspective.
I guess there's this option too https://docs.snowflake.com/en/user-guide/python-connector-example#label-python-connector-bypass-data-conversion which means we'd have to implement the entire conversion ourselves.
Actually that method is broken and just gives back data in Python format.