quickwit icon indicating copy to clipboard operation
quickwit copied to clipboard

add logs to identify the cause for publish /stage errors

Open fulmicoton opened this issue 1 year ago • 2 comments

fulmicoton avatar Jun 21 '24 09:06 fulmicoton

@trinity-1686a, I think you added some logs on a branch. Is there something to backport on main?

fmassot avatar Sep 25 '24 14:09 fmassot

the commit is bd72e1f9dbe59fc03f48824099252f65a1ceab63 . I don't think it's in a state to be merged. This warns on transient errors, including those that would be retried automatically (which was the case of the error we were tracking down)

trinity-1686a avatar Sep 25 '24 15:09 trinity-1686a

@trinity-1686a is this somethign we want to keep? Can you take care of it?

fulmicoton avatar Dec 19 '24 09:12 fulmicoton

i don't think we want to do anything. Normal errors seem to already be logged. Afaict, the errors that caught someone's eyes were just transient errors (rate limiting) that increased a metric counter, and were automatically/silently retried somewhere inside a tower layer (by the client). We since made so the metastore retry some of these on its own (#5211 ), so the client counter shouldn't increase for that specific transient error.

trinity-1686a avatar Dec 19 '24 10:12 trinity-1686a

ok closing then

fulmicoton avatar Dec 19 '24 10:12 fulmicoton