apify-client-js icon indicating copy to clipboard operation
apify-client-js copied to clipboard

Investigate client behaviour in a case of target pod/node restart

Open mtrunkat opened this issue 3 years ago • 6 comments

From this discussion https://apifier.slack.com/archives/C013WC26144/p1653552365035479, it seems that sometimes there is a series of network errors that lead to a suspicion that the client might be retrying the requests to the same pod although it's dead.

2022-05-16T00:38:56.894Z WARN  ApifyClient: API request failed 4 times. Max attempts: 9.
2022-05-16T00:38:56.897Z Cause:Error: aborted
2022-05-16T00:38:56.899Z     at connResetException (node:internal/errors:692:14)
2022-05-16T00:38:56.901Z     at Socket.socketCloseListener (node:_http_client:414:19)

mtrunkat avatar May 30 '22 08:05 mtrunkat

I think it might be because of the keepalive connections and HTTPS tunneling. How does the client learn that the pod is down and it should retry elsewhere?

mnmkng avatar May 30 '22 14:05 mnmkng

Note: We could test this on multistaging by starting two API pods, starting an actor which uses the API in a loop, and then we would kill one of the two pods. We could also make a testing version of the client with some more debug logging to help us figure it out.

fnesveda avatar Jun 01 '22 12:06 fnesveda

2 pod multistaging here https://github.com/apify/apify-core/pull/6934

jirimoravcik avatar Jun 26 '22 19:06 jirimoravcik

It looks like keepalive doesn't work it will not propagate through the application load balancer and the requests are distributed between pods. There is a list of pods used for each API call, I was doing get run API call from the same apify client instance every 0,5 s. Because I have just 2 pods and ALB uses a round-robin schema the pods were switched each request.

0: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
1: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
2: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
3: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
4: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
5: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
6: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
7: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
8: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
9: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
10: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
11: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
12: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
13: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
14: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
15: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
16: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
17: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
18: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
19: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
20: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
21: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
22: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
23: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
24: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
25: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
26: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
27: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
28: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
29: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
30: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
31: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
32: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"
33: "apify-api-dummymultistagingbranchfortesting-67745785c5-gf6m6"
34: "apify-api-dummymultistagingbranchfortesting-67745785c5-4btl4"

If you restart one node it simply switches to the new one.

drobnikj avatar Jul 11 '22 16:07 drobnikj

If we want to support keep-alive headers we probably need some changes on ALB or elsewhere in platform networking. Not sure if it can affect users that it is not working right now, but it probably didn't work from a time when we start using ALB, cc @dragonraid @mnmkng

drobnikj avatar Jul 11 '22 16:07 drobnikj

I move this to the icebox and we can follow up once the issue appears again. It looks like some network or any other error. But hard to say two months after, we do not have any logs and the issue didn't appear in the same actor again till this report.

drobnikj avatar Jul 11 '22 17:07 drobnikj