quickwit
quickwit copied to clipboard
add logs to identify the cause for publish /stage errors
@trinity-1686a, I think you added some logs on a branch. Is there something to backport on main?
the commit is bd72e1f9dbe59fc03f48824099252f65a1ceab63 . I don't think it's in a state to be merged. This warns on transient errors, including those that would be retried automatically (which was the case of the error we were tracking down)
@trinity-1686a is this somethign we want to keep? Can you take care of it?
i don't think we want to do anything. Normal errors seem to already be logged. Afaict, the errors that caught someone's eyes were just transient errors (rate limiting) that increased a metric counter, and were automatically/silently retried somewhere inside a tower layer (by the client). We since made so the metastore retry some of these on its own (#5211 ), so the client counter shouldn't increase for that specific transient error.
ok closing then