arrow icon indicating copy to clipboard operation
arrow copied to clipboard

[Python] FlightServerBase don't support inject grpc options

Open sundy-li opened this issue 1 year ago • 6 comments

Describe the enhancement requested

I'm using FlightServerBase as a server which acts as an UDFServer.

But I did not find another document or places to inject GRPC options in FlightServerBase.

https://arrow.apache.org/docs/python/generated/pyarrow.flight.FlightServerBase.html

Component(s)

Python

sundy-li avatar Feb 27 '24 03:02 sundy-li

Hi @sundy-li, were you thinking about options like we have in the tests?

https://github.com/apache/arrow/blob/cd06982fddcc0b4327cade6e5429f903dd77fd1a/python/pyarrow/tests/test_flight.py#L2036-L2039

If so, I think we could improve this with:

  1. Adding examples to https://arrow.apache.org/docs/python/generated/pyarrow.flight.connect.html#pyarrow-flight-connect
  2. Adding a Python cookbook entry for this

Would you be interested in sending a PR in for either/both?

amoeba avatar Mar 01 '24 00:03 amoeba

@amoeba

Thanks for the reply. But I am not looking about set grpc options on client side. Let me explain the issue more directly.

I am using Arrow Flight as a server in databend-udf, it's python based.

And I want to make the server handle a long-time-response request (such as time.sleep(300)). Now I got the error from client side(it's rust based) after 240 s:

Decode record batch error: Tonic(Status { code: Unavailable, message: "Too many pings", source: None })

I searched the internet, and users suggested me to add grpc options on server side rather than client side.

So I want to know how to add grpc options in FlightServerBase (such as set GRPC_ARG_HTTP2_MAX_PINGS_WITHOUT_DATA to be zero ).

sundy-li avatar Mar 01 '24 03:03 sundy-li

Hi @sundy-li, sorry about that. It doesn't look like we expose those options to PyArrow at the moment but it seems useful to expose them. Would you be interested in submitting a PR?

amoeba avatar Mar 04 '24 00:03 amoeba

Hi @sundy-li, sorry about that. It doesn't look like we expose those options to PyArrow at the moment but it seems useful to expose them. Would you be interested in submitting a PR?

I'm afraid not. I am new to this repo and I found it will involve lots cpp codes and pyx codes to have this feature . It's not an easy task I think.

sundy-li avatar Mar 04 '24 07:03 sundy-li

No worries @sundy-li. Filing issues like you've done is a great way to contribute and if you ever want to take a crack at a PR, there's lots of good options tagged as good-first-issue.

amoeba avatar Mar 05 '24 20:03 amoeba

I'm running into a similar usecase when trying to configure the "generic_options" for the server side but it's about the GRPC_ARG_MAX_SEND_MESSAGE_LENGTH. If this is still impossible yet for python-based flight server, I'm curious why changing the max_chunksize for batch stream on the send data (on the server side) to be bigger than 4MB (the default gRPC max size) won't cause any errors. For references, my code looks like the follows:

reader = arrow.ipc.RecordBatchReader().from_batches(
    data.schema, data.to_batches(max_chunksize=8 * 1024 * 1024)
)
return flight.RecordBatchStream(reader)

On the client side, I use the reader.read_chunk() and find that it has the same length as the send chunk (8MB). Is it because some hidden mechanisms in the cpp layer that automatically chop send data into the appropriate size?

shikibu-z avatar Oct 18 '24 18:10 shikibu-z

This issue has been marked as stale because it has had no activity in the past 365 days. Please remove the stale label or comment below, or this issue will be closed in 14 days. If this improvement is still desired but has no current owner, please add the 'Status: needs champion' label.

github-actions[bot] avatar Nov 18 '25 11:11 github-actions[bot]