pylivy
pylivy copied to clipboard
download_sql does not return more than 1000 rows
is there a way I can download 100K rows using download_sql?
I am having the same issue, and I would like to download as many rows as needed.
I too faced the same and figured out that Livy is restricting it to 1000 records. Spark explain plan shows a global limit of 1000 and I am trying to find how to bump that up.
Would anyone be interested in an s3/hdfs redirect download feature (@acroz not sure if this would be within the scope of this project)?
Users could provide the following additional params to session constructor:
- a prefix/directory for temporary storage (s3://, hdfs://, file://, etc)
- a
fetcher
function that returns a dataframe given a URI as a string.
The download method could have an optional flag for overriding default behavior. Instead of writing out rows, dataframe is saved to temp storage at generated uri uri = "TMP_DIR/DF_NAME.parquet"
and returns fetcher(uri)
.
LivySession.create(livy_url, kind=SessionKind.SQL, spark_conf={'livy.rsc.sql.num-rows': '2000'})
with this spark_conf can control the output rows