datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Panics are silently ignored in parallel execution (spawn_execution)

Open ovr opened this issue 1 year ago • 1 comments

Describe the bug

DF uses tokio::spawn in spawn_execution function. It uses a channel under the hood to transfer data from tasks back to the output stream. Right now, it doesn't handle panic, which causes strange behavior when the tokio show panics in logs and the output is empty.

To catch panics, it's required to mark some places in the code as AssertUnwindSafe + catch it in spawn_execution function. Any another idea?

To Reproduce

  1. Introduce a new UDF function that panics
  2. Use sort to parallelize (use spawn_execution)

Expected behavior

Probably, it should return DataFushionError.

cC @alamb @andygrove

ovr avatar Aug 11 '22 14:08 ovr

I think returning a DataFusionError sounds like a very reasonable thing to do

When a tokio::task panics, it will return a JoinError as an error code -- perhaps somewhere datafusion does a tokio::task::spawn but then doesn't check for an error 🤔

alamb avatar Aug 11 '22 15:08 alamb