vortex icon indicating copy to clipboard operation
vortex copied to clipboard

fix: decimal scalar error message

Open a10y opened this issue 7 months ago • 1 comments

Leaving me in suspense

image

a10y avatar May 21 '25 15:05 a10y

This PR has morphed into a general fix-up of Decimal array handling as part of a push to get working round-trip of TPC-DS data from Parquet -> Arrow -> Vortex.

Some FLUPs to accomplish

  • [x] Make ToArrowCanonical avoid re-casting the decimal values type, instead should canonicalize directly into an i128/i256 builder

Capturing some Slack commentary

image image image

a10y avatar May 22 '25 16:05 a10y

Removing the type parameter was mostly about consistency, since our decimal type doesn't specify an exact PType for the buffer holding the values.

a10y avatar May 28 '25 13:05 a10y

But we still have to pick one decimal type for the builder? And for exporting to Arrow, we could even create an i128 builder and then use append_to_builder if we wanted to avoid the copy in the middle.

Is the problem that the downcasting becomes hard? If so, we could always have DecimalBuilder(TypedDecimalBuilder<D>) such that you can downcast in two steps.

gatesn avatar May 28 '25 14:05 gatesn

I think we can add the type parameter back. The important thing is that we preserve the ability to extend from a slice of some other T: NativeDecimalType.

I think we also should be using checked casting rather than AsPrimitive. This might actually be easier with the other i256 type from your PR...

a10y avatar May 28 '25 15:05 a10y

Agree with all that, and yeah the other PR has an i256 that implements NumCast, so we get back an Option<T> to perform checked casts.

Do you need the other PR to make that change easier? I already do that "append anything to anything" thing here: https://github.com/vortex-data/vortex/pull/3373/files#diff-e3c7c38c80720dc1fd02f75e9f7d273b03ddb1fb0d7658b7e3f83b7f57c71893R25-R66

You can merge that PR into this one if it's easier.

gatesn avatar May 28 '25 16:05 gatesn

Following up from our slack convo, decided not to go with i256 due to questions around byte representation stability and FFI.

I've added new ToPrimitive and BigCast traits into our vortex-scalar here which gives us the ability to do checked casts between i256 and all other primitive types. I've also implemented

Also I ended up keeping the untyped DecimalBuilder but I did clean it up a bit with the help of a macro so it's less visually cluttersome

a10y avatar May 28 '25 19:05 a10y

Hmm, I still think the typed builder would be better. The internal enum adds another level of dispatch to most of the builder functions.

gatesn avatar May 28 '25 19:05 gatesn

The issue I couldn't find an easy way around was how to implement append_scalar from ArrayBuilderExt generically, b/c you don't have enough information written down anywhere to infer the right T for .downcast_mut::<DecimalBuilder<T>>

FWIW I did at one point measure the difference between Box<dyn T> and enum dispatch overhead and the dynamic dispatch overhead is strictly worse, by a factor of at least 2-3x.

a10y avatar May 28 '25 19:05 a10y

That's what I was saying about Dow casting first to DecimalBuilder, then you have a values type to downcast further to a TypedDecimalBuilder.

gatesn avatar May 28 '25 19:05 gatesn

Could you maybe share more what you're thinking? Is the idea being to get some long-lived handle to the typed thing to amortize the downcasting overhead? AFAICT we already pretty much always use it behind a Box<dyn ArrayBuilder> anyway so there is a cost for every method call. ChunkedArray canonicalize amortizes this cost a bit b/c it extends_from_array

a10y avatar May 28 '25 20:05 a10y

Yah, played with it, it's just as annoying to hold a typed thing internally.

gatesn avatar May 28 '25 20:05 gatesn