fix: decimal scalar error message
Leaving me in suspense
This PR has morphed into a general fix-up of Decimal array handling as part of a push to get working round-trip of TPC-DS data from Parquet -> Arrow -> Vortex.
Some FLUPs to accomplish
- [x] Make ToArrowCanonical avoid re-casting the decimal values type, instead should canonicalize directly into an i128/i256 builder
Capturing some Slack commentary
Removing the type parameter was mostly about consistency, since our decimal type doesn't specify an exact PType for the buffer holding the values.
But we still have to pick one decimal type for the builder? And for exporting to Arrow, we could even create an i128 builder and then use append_to_builder if we wanted to avoid the copy in the middle.
Is the problem that the downcasting becomes hard? If so, we could always have DecimalBuilder(TypedDecimalBuilder<D>) such that you can downcast in two steps.
I think we can add the type parameter back. The important thing is that we preserve the ability to extend from a slice of some other T: NativeDecimalType.
I think we also should be using checked casting rather than AsPrimitive. This might actually be easier with the other i256 type from your PR...
Agree with all that, and yeah the other PR has an i256 that implements NumCast, so we get back an Option<T> to perform checked casts.
Do you need the other PR to make that change easier? I already do that "append anything to anything" thing here: https://github.com/vortex-data/vortex/pull/3373/files#diff-e3c7c38c80720dc1fd02f75e9f7d273b03ddb1fb0d7658b7e3f83b7f57c71893R25-R66
You can merge that PR into this one if it's easier.
Following up from our slack convo, decided not to go with i256 due to questions around byte representation stability and FFI.
I've added new ToPrimitive and BigCast traits into our vortex-scalar here which gives us the ability to do checked casts between i256 and all other primitive types. I've also implemented
Also I ended up keeping the untyped DecimalBuilder but I did clean it up a bit with the help of a macro so it's less visually cluttersome
Hmm, I still think the typed builder would be better. The internal enum adds another level of dispatch to most of the builder functions.
The issue I couldn't find an easy way around was how to implement append_scalar from ArrayBuilderExt generically, b/c you don't have enough information written down anywhere to infer the right T for .downcast_mut::<DecimalBuilder<T>>
FWIW I did at one point measure the difference between Box<dyn T> and enum dispatch overhead and the dynamic dispatch overhead is strictly worse, by a factor of at least 2-3x.
That's what I was saying about Dow casting first to DecimalBuilder, then you have a values type to downcast further to a TypedDecimalBuilder.
Could you maybe share more what you're thinking? Is the idea being to get some long-lived handle to the typed thing to amortize the downcasting overhead? AFAICT we already pretty much always use it behind a Box<dyn ArrayBuilder> anyway so there is a cost for every method call. ChunkedArray canonicalize amortizes this cost a bit b/c it extends_from_array
Yah, played with it, it's just as annoying to hold a typed thing internally.