scio icon indicating copy to clipboard operation
scio copied to clipboard

Improve error message when sparkey hits array-size limits

Open kellen opened this issue 3 months ago • 0 comments

When using saveAsSparkey, if any shard is > ~2gb then you will get a coder exception and something like

Error message from worker: org.apache.beam.sdk.util.UserCodeException: java.lang.OutOfMemoryError: Required array length 2147483639 + 15534 is too large

which is not easily interpretable.

See if we can preemptively capture serialized sizes so that we can issue a better error message like "Increase number of sparkey shards"

kellen avatar Mar 13 '24 14:03 kellen