PyAthenaJDBC icon indicating copy to clipboard operation
PyAthenaJDBC copied to clipboard

Implement retry processing when TooManyRequestsException and ThrottlingException occur

Open laughingman7743 opened this issue 7 years ago • 5 comments

https://github.com/jd/tenacity laughingman7743/PyAthena#8

laughingman7743 avatar Jan 25 '17 13:01 laughingman7743

Hi, we are facing with a similar problem. Our code is this: cur.execute(sql, params) where sql is a simple select. this is within a lambda function that we call in parallel several times.

we get the error

File "/var/task/pyathenajdbc/util.py", line 35, in _wrapper
return wrapped(*args, **kwargs)
File "/var/task/pyathenajdbc/util.py", line 25, in _wrapper
return wrapped(*args, **kwargs)
File "/var/task/pyathenajdbc/cursor.py", line 128, in execute
raise_from(DatabaseError(unwrap_exception(e)), e)
File "/var/task/future/utils/__init__.py", line 400, in raise_from
exec(execstr, myglobals, mylocals)
File "<string>", line 1, in <module>
pyathenajdbc.error.DatabaseError: com.simba.athena.amazonaws.services.athena.model.AmazonAthenaException: Rate exceeded (Service: AmazonAthena; Status Code: 400; Error Code: ThrottlingException; Request ID: xxxxxx)```


is there an elegant solution to this problem? Thanks. 

otmezger avatar Jun 12 '19 15:06 otmezger

I think that it is not good to change the following options of the JDBC driver.

https://s3.amazonaws.com/athena-downloads/drivers/JDBC/SimbaAthenaJDBC_2.0.7/docs/Simba+Athena+JDBC+Driver+Install+and+Configuration+Guide.pdf

DefaultValue DataType Required
100 Integer No

Description

The maximum amount of time, in milliseconds, that the driver waits between attempts when polling the Athena server for query results. You cannot specify an interval that is less than 5ms. The driver polls the server 5ms after query execution begins, and exponentially increases the polling interval to the amount of time specified by this property. For example, if MaxQueryExecutionPollingInterval is set to 2000, the driver polls the server at these intervals: 5ms after query execution has begun, 100ms after the first poll, and then 2000ms after the second poll. The driver then continues to poll the server every 2000ms until the query results are returned.

I may try to implement a retry if I have time, but don't expect it. I would like you to use PyAthena(https://github.com/laughingman7743/PyAthena) if possible.

laughingman7743 avatar Jun 13 '19 14:06 laughingman7743

Is pyathena an alternative to pyathenajdwc?

We are currently developing backoff for pyathenajdwc, and can share the results. Should we stop and switch to the other package?

On Thu 13. Jun 2019 at 16:17, laughingman7743 [email protected] wrote:

I think that it is not good to change the following options of the JDBC driver.

https://s3.amazonaws.com/athena-downloads/drivers/JDBC/SimbaAthenaJDBC_2.0.7/docs/Simba+Athena+JDBC+Driver+Install+and+Configuration+Guide.pdf DefaultValue DataType Required 100 Integer No

Description

The maximum amount of time, in milliseconds, that the driver waits between attempts when polling the Athena server for query results. You cannot specify an interval that is less than 5ms. The driver polls the server 5ms after query execution begins, and exponentially increases the polling interval to the amount of time specified by this property. For example, if MaxQueryExecutionPollingInterval is set to 2000, the driver polls the server at these intervals: 5ms after query execution has begun, 100ms after the first poll, and then 2000ms after the second poll. The driver then continues to poll the server every 2000ms until the query results are returned.

I may try to implement a retry if I have time, but don't expect it. I would like you to use PyAthena(https://github.com/laughingman7743/PyAthena) if possible.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/laughingman7743/PyAthenaJDBC/issues/5?email_source=notifications&email_token=AAVBNSVQ74W5TFXK6JVI2N3P2JJHTA5CNFSM4C5V34HKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXT223Q#issuecomment-501722478, or mute the thread https://github.com/notifications/unsubscribe-auth/AAVBNSVOC5V46LAMA727TCLP2JJHTANCNFSM4C5V34HA .

otmezger avatar Jun 13 '19 14:06 otmezger

PyAthenaJDBC is a wrapper library for the JDBC driver. You need a Java environment. PyAthena is implemented only in Python. You do not need a Java environment.

The JDBC driver has been implemented to get results in streaming, but I do not know how fast it is. I can not find the benefit of using JDBC from Python. It seems that getting results is faster using PyAthena's PandasCursor. You should use it if you find it beneficial to get streaming results in the JDBC driver. Otherwise you may want to use PyAthena.

laughingman7743 avatar Jun 13 '19 15:06 laughingman7743

I created this library because at the beginning of the Athena release, only the JDBC client was provided, and I wanted to use it from Python. But now we can easily connect to Athena with Python. And I created PyAthena as a pure Python library. So I want you to use PyAthena unless you feel the benefits of JDBC.

laughingman7743 avatar Jun 13 '19 15:06 laughingman7743