Patrick Vinograd

Results 17 comments of Patrick Vinograd

I believe we are also experiencing this issue. Have not tried the workaround of setting minRunners > 0. In our case the behavior on the Github UI side is that...

In other words, this _could_ be a problem with the GH Actions service, not with the ARC project.

The issue of subsequent workflows getting queued due to the failed runner connects to #3953 and #3821. Just trying to link up related issues in the hopes that maintainers see...

I was nodding along with the original filer, but some more digging including looking at the conversation in https://github.com/actions/actions-runner-controller/issues/3619 clued me in to the fact that number of "current runners"...

Correct, it's an external_account credential file, from an AWS EKS workload, using the [recommended configuration](https://cloud.google.com/iam/docs/workload-identity-federation-with-kubernetes#eks). We don't set any token lifetime explicitly which I understand to mean it is using...

I will open a support ticket and link to this issue. Thanks for the quick follow-up/feedback!

I was able to add time-of-request logging to all our errored Vertex API requests, and late Friday/early Saturday a batch of asynchronous jobs again tried to use an expired token....

@quartzmo That logic jumped out at me as well. I agree that we could be experiencing silent failures during the attempt to proactively refresh the token. But the errors we...

It's also a little hard to follow how many separate tokens/refreshes are in play - it seems like internally its pulling the EKS service account token from the projected volume...

Open to trying that, I'm not clear on how I would supply that option, we're using ``` aiplatform.NewPredictionClient(ctx, option.WithCredentialsFile(clientConfigPath) ``` i.e. not relying on ADC. And I don't see a...