beam icon indicating copy to clipboard operation
beam copied to clipboard

The PostCommit Python Arm job is flaky

Open github-actions[bot] opened this issue 1 year ago • 15 comments

The PostCommit Python Arm is failing over 50% of the time Please visit https://github.com/apache/beam/actions/workflows/beam_PostCommit_Python_Arm.yml?query=is%3Afailure+branch%3Amaster to see the logs.

github-actions[bot] avatar Mar 27 '24 03:03 github-actions[bot]

@tvalentyn do we have a good owner for this ?

chamikaramj avatar Apr 23 '24 00:04 chamikaramj

I actually can't find a single green run since this test suite was created (back in September)

ahmedabu98 avatar Apr 24 '24 16:04 ahmedabu98

You may be right, thanks for correction, @ahmedabu98

2024-04-24T12:03:53.0963029Z Please verify that you have permissions to write to the parent directory..
2024-04-24T12:03:53.0964903Z The configuration directory may not be writable. To learn more, see https://cloud.google.com/sdk/docs/configurations#creating_a_configuration
2024-04-24T12:03:53.0968080Z ERROR: (gcloud.auth.docker-helper) Could not create directory [/var/lib/kubelet/pods/573a1844-124b-4e12-bb0f-0325d0f3c3aa/volumes/kubernetes.io~empty-dir/gcloud]: Permission denied.
2024-04-24T12:03:53.0969612Z 
2024-04-24T12:03:53.0970063Z Please verify that you have permissions to write to the parent directory.
2024-04-24T12:03:53.3953756Z #29 pushing layers 1.4s done
2024-04-24T12:03:53.3956208Z #29 ERROR: failed to push us.gcr.io/apache-beam-testing/github-actions/beam_python3.8_sdk:2.57.0-SNAPSHOT: error getting credentials - err: exit status 1, out: ``
2024-04-24T12:03:53.8953735Z ------

cc: @damccorm - do you remember if this suite never worked or the above error is an artifact of GHA migration?

We can reclassify this as part part of ARM backlog work.

tvalentyn avatar Apr 24 '24 17:04 tvalentyn

This was working last month - https://github.com/apache/beam/actions/workflows/beam_PostCommit_Python_Arm.yml?query=is%3Asuccess+branch%3Amaster+event%3Aschedule

damccorm avatar Apr 24 '24 17:04 damccorm

Looks like it went flaky then permared around then

damccorm avatar Apr 24 '24 17:04 damccorm

Ahh my apologies, I was looking at it through a is:failure filter

ahmedabu98 avatar Apr 24 '24 17:04 ahmedabu98

So by removing https://github.com/apache/beam/blob/master/.github/workflows/beam_PostCommit_Python_Arm.yml#L113

I get the test to move along but its still failing on my fork due to some permission with the Healthcare api. Oauth scope is wrong or something: https://github.com/volatilemolotov/beam/actions/runs/8820257015/job/24213449686#step:13:13113

volatilemolotov avatar Apr 25 '24 14:04 volatilemolotov

@volatilemolotov could you put up a PR to make that change? Definitely seems like it is getting further.

@svetakvsundhar do you know what scope is missing? Given the normal postcommit python isn't failing, it might just be an issue with your service account specifically?

damccorm avatar Apr 25 '24 15:04 damccorm

Sure, here it is https://github.com/apache/beam/pull/31102

volatilemolotov avatar Apr 25 '24 15:04 volatilemolotov

Thanks - merged, lets see what the result on master is

damccorm avatar Apr 25 '24 15:04 damccorm

@svetakvsundhar do you know what scope is missing? Given the normal postcommit python isn't failing, it might just be an issue with your service account specifically?

+1, it could be a service account specific issue. I'd want to see a couple of more runs of this to see if it's actually an issue. If so, a thought might be to add ["https://www.googleapis.com/auth/cloud-platform"] as a scope manually in the test.

svetakvsundhar avatar Apr 25 '24 16:04 svetakvsundhar

https://github.com/apache/beam/actions/runs/8840477636

it works now

volatilemolotov avatar Apr 26 '24 09:04 volatilemolotov

Great, thanks @volatilemolotov

Looks like we're still flaky - https://github.com/apache/beam/actions/runs/8843342204/job/24283441647 - but that's an improvement and it looks like a test flake instead of infra

damccorm avatar Apr 26 '24 11:04 damccorm

Permared now

kennknowles avatar Apr 29 '24 13:04 kennknowles

I think that's wrong - https://github.com/apache/beam/actions/workflows/beam_PostCommit_Python_Arm.yml?query=branch%3Amaster+event%3Aschedule

damccorm avatar Apr 29 '24 13:04 damccorm

Reopening since the workflow is still flaky

github-actions[bot] avatar Sep 22 '24 18:09 github-actions[bot]

Fixed by https://github.com/apache/beam/pull/32530

damccorm avatar Sep 23 '24 11:09 damccorm