Empty error message from OCH GraphQL client
Description
Sometimes on our long-running cluster, we get an empty error message from the GraphQL client we use:
# 1: graphql:
We need to investigate whether there is a bug in our code (OCH client) or in our dependency (retry-go, machinebox/graphql client, etc.).
- This may not be related just to local OCH client. Investigate all GraphQL clients (Public OCH, Local OCH, Engine) for potential causes.
Expected behavior
We get the error message for the root cause of GraphQL request failure.
Actual behavior
Example run: https://github.com/capactio/capact/runs/2762064610?check_suite_focus=true
# 1: graphql:
Jun 7 08:37:29.539: while deleting TypeInstance with ID a0ecd4c3-c55a-497a-8c68-05b5e6242cb8: while executing mutation to delete TypeInstance: All attempts fail:
# 1: graphql:
• Failure [305.075 seconds] Action /capact.io/capact/test/e2e/action_test.go:37 Action execution /capact.io/capact/test/e2e/action_test.go:57 should pick proper Implementation and inject TypeInstance based on cluster policy [It] /capact.io/capact/test/e2e/action_test.go:58 Timed out after 300.000s. Expected : FAILED to be an element of <[]interface *** | len:1, cap:1>: [["SUCCEEDED"]] /capact.io/capact/test/e2e/action_test.go:343 ```
We need to bump prio for this task. It's problematic when it comes to debugging. We had such problem in our engine:
2021-06-29T08:40:59.930Z DEBUG controller-runtime.manager.events [email protected]/zapr.go:69 Warning {"object": {"kind":"Action","namespace":"capact-system","name":"capact-upgrade-fdhwl","uid":"089ef74c-cb7f-46b6-b1a3-b028c87041f7","apiVersion":"core.capact.io/v1alpha1","resourceVersion":"19263936"}, "reason": "Delete runner action", "message": "while unlocking TypeInstances: while executing mutation to unlock TypeInstances: All attempts fail:\n#1: graphql: "}
The strange thing is that I was able to unlock a given TypeInstance using GraphQL console exposed on Gateway.
EDIT: The reason was that there was no resp body, but the status code was 422.