openverse-api icon indicating copy to clipboard operation
openverse-api copied to clipboard

`ReadTimeout` raised from general exception catching

Open krysal opened this issue 3 years ago • 2 comments

Sentry link

https://sentry.io/share/issue/89863f561e9146fba39bb084e6310a68/

Description

This produces most of the events we are receiving from errors on the API, with 3.2k observations in the last 14 days at the moment of writing.

Origin in source code: https://github.com/WordPress/openverse-api/blob/9f8b15773df511b0900e2a0adf7785382abee450/api/catalog/api/views/media_views.py#L206

Reproduction

visit https://api.openverse.engineering/v1/images/b312f615-6c9e-46e4-8738-d97e7836a8e1/thumb/

krysal avatar Sep 15 '22 16:09 krysal

This error is now up to 27k in the last 30 days and 1.9k in the last 24 hours.

The current timeout of 10s might be insufficient for some fairly large images to load. Increasing it might reduce the number of error events but bringing it down to zero seems infeasible unless we reliably have low-size alternatives for all large images in the catalog.

dhruvkb avatar Oct 18 '22 06:10 dhruvkb

It would be really nice to have some data on this to see which providers are contributing the most to these timeouts. Because the URLs stored in the sentry event are our thumbnail links, I'm not sure that's something we'd be able to extract from Sentry. Do you think we could add some logic to store a mapping of TLD -> count in redis? We could increment it each time the ReadTimeout is hit, then take a look and see which providers are timing out the most. We might be able to address them upstream in a similar manner to SMK.

AetherUnbound avatar Oct 18 '22 18:10 AetherUnbound

Unassigning myself because #982 has been merged and we don't have much to do but wait and see the cache for which domains are causing the most timeouts.

dhruvkb avatar Oct 26 '22 13:10 dhruvkb