pylivy icon indicating copy to clipboard operation
pylivy copied to clipboard

download_sql does not return more than 1000 rows

Open muracstech opened this issue 4 years ago • 4 comments

is there a way I can download 100K rows using download_sql?

muracstech avatar Jan 29 '21 00:01 muracstech

I am having the same issue, and I would like to download as many rows as needed.

lcnletgo avatar Jun 30 '21 12:06 lcnletgo

I too faced the same and figured out that Livy is restricting it to 1000 records. Spark explain plan shows a global limit of 1000 and I am trying to find how to bump that up.

anupray avatar Dec 02 '21 08:12 anupray

Would anyone be interested in an s3/hdfs redirect download feature (@acroz not sure if this would be within the scope of this project)?

Users could provide the following additional params to session constructor:

  1. a prefix/directory for temporary storage (s3://, hdfs://, file://, etc)
  2. a fetcher function that returns a dataframe given a URI as a string.

The download method could have an optional flag for overriding default behavior. Instead of writing out rows, dataframe is saved to temp storage at generated uri uri = "TMP_DIR/DF_NAME.parquet" and returns fetcher(uri).

padraic-mcatee avatar Jan 05 '22 21:01 padraic-mcatee

LivySession.create(livy_url, kind=SessionKind.SQL, spark_conf={'livy.rsc.sql.num-rows': '2000'}) with this spark_conf can control the output rows

penggongkui avatar Mar 21 '22 06:03 penggongkui