pylivy icon indicating copy to clipboard operation
pylivy copied to clipboard

No clean way to return empty dataframe

Open ishmandoo opened this issue 4 years ago • 2 comments

Right now the only way to return an empty dataset or null result is to construct an empty Spark frame. This is kind of clunky to do.

Might it make sense to change session.read to work on a variable set to None and interpret it as an empty dataframe?

ishmandoo avatar Jan 27 '21 16:01 ishmandoo

Hi Ben, thanks for the suggestion!

Could you help me to understand your use case a little better? It's not clear to me in what situation you'd have a variable in a (presumably PySpark) session set to None and wish that to be interpreted as an empty DataFrame when you attempt to download it. In this suggestion, the differentiation between None (no dataframe at all) and an empty result would be lost, which seems valuable to keep.

acroz avatar Jan 27 '21 19:01 acroz

My understanding is that trying to read a variable whose value is None will result in an error. For my application, I sometimes want to return a null result that will be interpreted as an empty dataframe. Right now I'm building an empty spark dataframe to return like spark.createDataFrame([], T.StructType([])). I was hoping to avoid having to do that.

ishmandoo avatar Jan 27 '21 20:01 ishmandoo