grpc-go icon indicating copy to clipboard operation
grpc-go copied to clipboard

Flaky test: Test/End2End

Open easwars opened this issue 3 years ago • 1 comments
trafficstars

https://github.com/grpc/grpc-go/runs/6496904034?check_suite_focus=true

--- FAIL: Test (18.49s)
    --- FAIL: Test/End2End (10.88s)
        tlogger.go:116: INFO server.go:588 [core] [Server #11] Server created  (t=+11.41141ms)
        tlogger.go:116: INFO advancedtls_integration_test.go:115 [core] [Channel #12] Channel created  (t=+12.608811ms)
        tlogger.go:116: INFO clientconn.go:1555 [core] [Channel #12] original dial target is: "localhost:38711"  (t=+12.744512ms)
        tlogger.go:116: INFO server.go:776 [core] [Server #11 ListenSocket #13] ListenSocket created  (t=+12.858812ms)
        tlogger.go:116: INFO clientconn.go:1562 [core] [Channel #12] parsed dial target is: {Scheme:localhost Authority: Endpoint:38711 URL:{Scheme:localhost Opaque:38711 User: Host: Path: RawPath: ForceQuery:false RawQuery: Fragment: RawFragment:}}  (t=+13.068712ms)
        tlogger.go:116: INFO clientconn.go:1575 [core] [Channel #12] fallback to scheme "passthrough"  (t=+13.168612ms)
        tlogger.go:116: INFO clientconn.go:1583 [core] [Channel #12] parsed dial target is: {Scheme:passthrough Authority: Endpoint:localhost:38711 URL:{Scheme:passthrough Opaque: User: Host: Path:/localhost:38711 RawPath: ForceQuery:false RawQuery: Fragment: RawFragment:}}  (t=+13.337312ms)
        tlogger.go:116: INFO clientconn.go:259 [core] [Channel #12] Channel authority set to "localhost:38711"  (t=+13.447812ms)
        tlogger.go:116: INFO resolver_conn_wrapper.go:175 [core] [Channel #12] Resolver state updated: {
              "Addresses": [
                {
                  "Addr": "localhost:38711",
                  "ServerName": "",
                  "Attributes": null,
                  "BalancerAttributes": null,
                  "Type": 0,
                  "Metadata": null
                }
              ],
              "ServiceConfig": null,
              "Attributes": null
            } (resolver returned new addresses)  (t=+13.748212ms)
        tlogger.go:116: INFO balancer_conn_wrappers.go:271 [core] [Channel #12] Channel switches to new LB policy "pick_first"  (t=+14.118513ms)
        tlogger.go:116: INFO clientconn.go:721 [core] [Channel #12 SubChannel #14] Subchannel created  (t=+14.328413ms)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #12 SubChannel #14] Subchannel Connectivity change to CONNECTING  (t=+14.576713ms)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #12] Channel Connectivity change to CONNECTING  (t=+14.849113ms)
        tlogger.go:116: INFO clientconn.go:1189 [core] [Channel #12 SubChannel #14] Subchannel picks a new address "localhost:38711" to connect  (t=+14.978314ms)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #12 SubChannel #14] Subchannel Connectivity change to READY  (t=+212.763852ms)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #12] Channel Connectivity change to READY  (t=+212.960453ms)
        tlogger.go:116: INFO advancedtls_integration_test.go:110 [core] [Channel #17] Channel created  (t=+216.178458ms)
        tlogger.go:116: INFO clientconn.go:1555 [core] [Channel #17] original dial target is: "localhost:38711"  (t=+216.[298](https://github.com/grpc/grpc-go/runs/6496904034?check_suite_focus=true#step:8:299)958ms)
        tlogger.go:116: INFO clientconn.go:1562 [core] [Channel #17] parsed dial target is: {Scheme:localhost Authority: Endpoint:38711 URL:{Scheme:localhost Opaque:38711 User: Host: Path: RawPath: ForceQuery:false RawQuery: Fragment: RawFragment:}}  (t=+216.522859ms)
        tlogger.go:116: INFO clientconn.go:1575 [core] [Channel #17] fallback to scheme "passthrough"  (t=+216.621159ms)
        tlogger.go:116: INFO clientconn.go:1583 [core] [Channel #17] parsed dial target is: {Scheme:passthrough Authority: Endpoint:localhost:38711 URL:{Scheme:passthrough Opaque: User: Host: Path:/localhost:38711 RawPath: ForceQuery:false RawQuery: Fragment: RawFragment:}}  (t=+216.757759ms)
        tlogger.go:116: INFO clientconn.go:259 [core] [Channel #17] Channel authority set to "localhost:38711"  (t=+216.862159ms)
        tlogger.go:116: INFO resolver_conn_wrapper.go:175 [core] [Channel #17] Resolver state updated: {
              "Addresses": [
                {
                  "Addr": "localhost:38711",
                  "ServerName": "",
                  "Attributes": null,
                  "BalancerAttributes": null,
                  "Type": 0,
                  "Metadata": null
                }
              ],
              "ServiceConfig": null,
              "Attributes": null
            } (resolver returned new addresses)  (t=+217.12036ms)
        tlogger.go:116: INFO balancer_conn_wrappers.go:271 [core] [Channel #17] Channel switches to new LB policy "pick_first"  (t=+217.32436ms)
        tlogger.go:116: INFO clientconn.go:721 [core] [Channel #17 SubChannel #18] Subchannel created  (t=+217.51156ms)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #17 SubChannel #18] Subchannel Connectivity change to CONNECTING  (t=+217.742461ms)
        tlogger.go:116: INFO clientconn.go:1189 [core] [Channel #17 SubChannel #18] Subchannel picks a new address "localhost:38711" to connect  (t=+217.962561ms)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #17] Channel Connectivity change to CONNECTING  (t=+218.079461ms)
        tlogger.go:116: INFO picker_wrapper.go:161 [core] blockingPicker: the picked transport is not ready, loop back to repick  (t=+218.261061ms)
        tlogger.go:116: WARNING server.go:891 [core] [Server #11] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "ServerHandshake(\"127.0.0.1:54524\") failed: x509: certificate signed by unknown authority"  (t=+429.323308ms)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #17 SubChannel #18] Subchannel Connectivity change to TRANSIENT_FAILURE  (t=+429.655909ms)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #17] Channel Connectivity change to TRANSIENT_FAILURE  (t=+429.867409ms)
        tlogger.go:116: INFO advancedtls_integration_test.go:115 [core] [Channel #20] Channel created  (t=+430.123109ms)
        tlogger.go:116: INFO clientconn.go:1555 [core] [Channel #20] original dial target is: "localhost:38711"  (t=+430.21901ms)
        tlogger.go:116: INFO clientconn.go:1562 [core] [Channel #20] parsed dial target is: {Scheme:localhost Authority: Endpoint:38711 URL:{Scheme:localhost Opaque:38711 User: Host: Path: RawPath: ForceQuery:false RawQuery: Fragment: RawFragment:}}  (t=+430.37291ms)
        tlogger.go:116: INFO clientconn.go:1575 [core] [Channel #20] fallback to scheme "passthrough"  (t=+430.45741ms)
        tlogger.go:116: INFO clientconn.go:1583 [core] [Channel #20] parsed dial target is: {Scheme:passthrough Authority: Endpoint:localhost:38711 URL:{Scheme:passthrough Opaque: User: Host: Path:/localhost:38711 RawPath: ForceQuery:false RawQuery: Fragment: RawFragment:}}  (t=+430.59431ms)
        tlogger.go:116: INFO clientconn.go:259 [core] [Channel #20] Channel authority set to "localhost:38711"  (t=+430.70621ms)
        tlogger.go:116: INFO resolver_conn_wrapper.go:175 [core] [Channel #20] Resolver state updated: {
              "Addresses": [
                {
                  "Addr": "localhost:38711",
                  "ServerName": "",
                  "Attributes": null,
                  "BalancerAttributes": null,
                  "Type": 0,
                  "Metadata": null
                }
              ],
              "ServiceConfig": null,
              "Attributes": null
            } (resolver returned new addresses)  (t=+430.956811ms)
        tlogger.go:116: INFO balancer_conn_wrappers.go:271 [core] [Channel #20] Channel switches to new LB policy "pick_first"  (t=+431.232111ms)
        tlogger.go:116: INFO clientconn.go:721 [core] [Channel #20 SubChannel #21] Subchannel created  (t=+431.414812ms)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #20 SubChannel #21] Subchannel Connectivity change to CONNECTING  (t=+431.680412ms)
        tlogger.go:116: INFO clientconn.go:1189 [core] [Channel #20 SubChannel #21] Subchannel picks a new address "localhost:38711" to connect  (t=+431.889112ms)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #20] Channel Connectivity change to CONNECTING  (t=+432.055013ms)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #20 SubChannel #21] Subchannel Connectivity change to READY  (t=+629.333937ms)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #20] Channel Connectivity change to READY  (t=+629.524437ms)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #20] Channel Connectivity change to SHUTDOWN  (t=+631.38724ms)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #20 SubChannel #21] Subchannel Connectivity change to SHUTDOWN  (t=+631.54984ms)
        tlogger.go:116: INFO clientconn.go:1433 [core] [Channel #20 SubChannel #21] Subchannel deleted  (t=+631.65854ms)
        tlogger.go:116: INFO clientconn.go:1033 [core] [Channel #20] Channel deleted  (t=+631.73994ms)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #17] Channel Connectivity change to SHUTDOWN  (t=+631.900441ms)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #17 SubChannel #18] Subchannel Connectivity change to SHUTDOWN  (t=+632.012341ms)
        tlogger.go:116: INFO clientconn.go:1433 [core] [Channel #17 SubChannel #18] Subchannel deleted  (t=+632.125741ms)
        tlogger.go:116: INFO clientconn.go:1033 [core] [Channel #17] Channel deleted  (t=+632.208741ms)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #12] Channel Connectivity change to SHUTDOWN  (t=+632.369542ms)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #12 SubChannel #14] Subchannel Connectivity change to SHUTDOWN  (t=+632.497242ms)
        tlogger.go:116: INFO clientconn.go:1433 [core] [Channel #12 SubChannel #14] Subchannel deleted  (t=+632.612742ms)
        tlogger.go:116: INFO clientconn.go:1033 [core] [Channel #12] Channel deleted  (t=+632.718642ms)
        tlogger.go:116: INFO server.go:727 [core] [Server #11 ListenSocket #13] ListenSocket deleted  (t=+632.997043ms)
        tlogger.go:116: INFO server.go:588 [core] [Server #24] Server created  (t=+633.514543ms)
        tlogger.go:116: INFO advancedtls_integration_test.go:115 [core] [Channel #25] Channel created  (t=+634.255745ms)
        tlogger.go:116: INFO clientconn.go:1555 [core] [Channel #25] original dial target is: "localhost:33425"  (t=+634.374945ms)
        tlogger.go:116: INFO clientconn.go:1562 [core] [Channel #25] parsed dial target is: {Scheme:localhost Authority: Endpoint:33425 URL:{Scheme:localhost Opaque:33425 User: Host: Path: RawPath: ForceQuery:false RawQuery: Fragment: RawFragment:}}  (t=+634.534145ms)
        tlogger.go:116: INFO clientconn.go:1575 [core] [Channel #25] fallback to scheme "passthrough"  (t=+634.699845ms)
        tlogger.go:116: INFO clientconn.go:1583 [core] [Channel #25] parsed dial target is: {Scheme:passthrough Authority: Endpoint:localhost:33425 URL:{Scheme:passthrough Opaque: User: Host: Path:/localhost:33425 RawPath: ForceQuery:false RawQuery: Fragment: RawFragment:}}  (t=+634.900046ms)
        tlogger.go:116: INFO clientconn.go:259 [core] [Channel #25] Channel authority set to "localhost:33425"  (t=+635.022746ms)
        tlogger.go:116: INFO server.go:776 [core] [Server #24 ListenSocket #26] ListenSocket created  (t=+635.206446ms)
        tlogger.go:116: INFO resolver_conn_wrapper.go:175 [core] [Channel #25] Resolver state updated: {
              "Addresses": [
                {
                  "Addr": "localhost:33425",
                  "ServerName": "",
                  "Attributes": null,
                  "BalancerAttributes": null,
                  "Type": 0,
                  "Metadata": null
                }
              ],
              "ServiceConfig": null,
              "Attributes": null
            } (resolver returned new addresses)  (t=+635.377346ms)
        tlogger.go:116: INFO balancer_conn_wrappers.go:271 [core] [Channel #25] Channel switches to new LB policy "pick_first"  (t=+635.637647ms)
        tlogger.go:116: INFO clientconn.go:721 [core] [Channel #25 SubChannel #27] Subchannel created  (t=+635.841947ms)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #25 SubChannel #27] Subchannel Connectivity change to CONNECTING  (t=+636.037148ms)
        tlogger.go:116: INFO clientconn.go:1189 [core] [Channel #25 SubChannel #27] Subchannel picks a new address "localhost:33425" to connect  (t=+636.225248ms)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #25] Channel Connectivity change to CONNECTING  (t=+636.382848ms)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #25 SubChannel #27] Subchannel Connectivity change to READY  (t=+839.360081ms)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #25] Channel Connectivity change to READY  (t=+839.561582ms)
        tlogger.go:116: INFO advancedtls_integration_test.go:110 [core] [Channel #30] Channel created  (t=+842.046686ms)
        tlogger.go:116: INFO clientconn.go:1555 [core] [Channel #30] original dial target is: "localhost:33425"  (t=+842.233186ms)
        tlogger.go:116: INFO clientconn.go:1562 [core] [Channel #30] parsed dial target is: {Scheme:localhost Authority: Endpoint:33425 URL:{Scheme:localhost Opaque:33425 User: Host: Path: RawPath: ForceQuery:false RawQuery: Fragment: RawFragment:}}  (t=+842.384486ms)
        tlogger.go:116: INFO clientconn.go:1575 [core] [Channel #30] fallback to scheme "passthrough"  (t=+842.491187ms)
        tlogger.go:116: INFO clientconn.go:1583 [core] [Channel #30] parsed dial target is: {Scheme:passthrough Authority: Endpoint:localhost:33425 URL:{Scheme:passthrough Opaque: User: Host: Path:/localhost:33425 RawPath: ForceQuery:false RawQuery: Fragment: RawFragment:}}  (t=+851.262801ms)
        tlogger.go:116: INFO clientconn.go:259 [core] [Channel #30] Channel authority set to "localhost:33425"  (t=+854.950207ms)
        tlogger.go:116: INFO resolver_conn_wrapper.go:175 [core] [Channel #30] Resolver state updated: {
              "Addresses": [
                {
                  "Addr": "localhost:33425",
                  "ServerName": "",
                  "Attributes": null,
                  "BalancerAttributes": null,
                  "Type": 0,
                  "Metadata": null
                }
              ],
              "ServiceConfig": null,
              "Attributes": null
            } (resolver returned new addresses)  (t=+855.231708ms)
        tlogger.go:116: INFO balancer_conn_wrappers.go:271 [core] [Channel #30] Channel switches to new LB policy "pick_first"  (t=+855.439808ms)
        tlogger.go:116: INFO clientconn.go:721 [core] [Channel #30 SubChannel #31] Subchannel created  (t=+855.631108ms)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #30 SubChannel #31] Subchannel Connectivity change to CONNECTING  (t=+855.887909ms)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #30] Channel Connectivity change to SHUTDOWN  (t=+856.068509ms)
        tlogger.go:116: INFO clientconn.go:1189 [core] [Channel #30 SubChannel #31] Subchannel picks a new address "localhost:33425" to connect  (t=+856.309809ms)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #30 SubChannel #31] Subchannel Connectivity change to SHUTDOWN  (t=+856.51101ms)
        tlogger.go:116: INFO clientconn.go:1433 [core] [Channel #30 SubChannel #31] Subchannel deleted  (t=+856.67911ms)
        tlogger.go:116: INFO clientconn.go:1033 [core] [Channel #30] Channel deleted  (t=+856.77361ms)
        tlogger.go:116: WARNING clientconn.go:1255 [core] [Channel #30 SubChannel #31] grpc: addrConn.createTransport failed to connect to {
              "Addr": "localhost:33425",
              "ServerName": "localhost:33425",
              "Attributes": null,
              "BalancerAttributes": null,
              "Type": 0,
              "Metadata": null
            }. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup localhost: operation was canceled"  (t=+861.076017ms)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #25] Channel Connectivity change to SHUTDOWN  (t=+861.564118ms)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #25 SubChannel #27] Subchannel Connectivity change to SHUTDOWN  (t=+861.822018ms)
        tlogger.go:116: INFO clientconn.go:1433 [core] [Channel #25 SubChannel #27] Subchannel deleted  (t=+861.939419ms)
        tlogger.go:116: INFO clientconn.go:1033 [core] [Channel #25] Channel deleted  (t=+862.040619ms)
        tlogger.go:116: INFO server.go:727 [core] [Server #24 ListenSocket #26] ListenSocket deleted  (t=+862.259319ms)
        --- FAIL: Test/End2End/test_the_reloading_feature_for_server_identity_callback_and_client_trust_callback (0.23s)
            advancedtls_integration_test.go:408: client failed to connect to localhost:33425. Error: context deadline exceeded
        tlogger.go:116: INFO server.go:588 [core] [Server #32] Server created  (t=+863.584821ms)
        tlogger.go:116: INFO advancedtls_integration_test.go:115 [core] [Channel #33] Channel created  (t=+864.214922ms)
        tlogger.go:116: INFO clientconn.go:1555 [core] [Channel #33] original dial target is: "localhost:40765"  (t=+864.352722ms)
        tlogger.go:116: INFO clientconn.go:1562 [core] [Channel #33] parsed dial target is: {Scheme:localhost Authority: Endpoint:40765 URL:{Scheme:localhost Opaque:40765 User: Host: Path: RawPath: ForceQuery:false RawQuery: Fragment: RawFragment:}}  (t=+864.495523ms)
        tlogger.go:116: INFO clientconn.go:1575 [core] [Channel #33] fallback to scheme "passthrough"  (t=+864.600123ms)
        tlogger.go:116: INFO clientconn.go:1583 [core] [Channel #33] parsed dial target is: {Scheme:passthrough Authority: Endpoint:localhost:40765 URL:{Scheme:passthrough Opaque: User: Host: Path:/localhost:40765 RawPath: ForceQuery:false RawQuery: Fragment: RawFragment:}}  (t=+864.739523ms)
        tlogger.go:116: INFO clientconn.go:259 [core] [Channel #33] Channel authority set to "localhost:40765"  (t=+864.871123ms)
        tlogger.go:116: INFO server.go:776 [core] [Server #32 ListenSocket #34] ListenSocket created  (t=+865.159424ms)
        tlogger.go:116: INFO resolver_conn_wrapper.go:175 [core] [Channel #33] Resolver state updated: {
              "Addresses": [
                {
                  "Addr": "localhost:40765",
                  "ServerName": "",
                  "Attributes": null,
                  "BalancerAttributes": null,
                  "Type": 0,
                  "Metadata": null
                }
              ],
              "ServiceConfig": null,
              "Attributes": null
            } (resolver returned new addresses)  (t=+865.330724ms)
        tlogger.go:116: INFO balancer_conn_wrappers.go:271 [core] [Channel #33] Channel switches to new LB policy "pick_first"  (t=+865.574024ms)
        tlogger.go:116: INFO clientconn.go:721 [core] [Channel #33 SubChannel #35] Subchannel created  (t=+865.737425ms)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #33 SubChannel #35] Subchannel Connectivity change to CONNECTING  (t=+865.944325ms)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #33] Channel Connectivity change to CONNECTING  (t=+866.138625ms)
        tlogger.go:116: INFO clientconn.go:1189 [core] [Channel #33 SubChannel #35] Subchannel picks a new address "localhost:40765" to connect  (t=+866.285726ms)
        tlogger.go:116: WARNING server.go:891 [core] [Server #32] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "ServerHandshake(\"127.0.0.1:43686\") failed: remote error: tls: bad certificate"  (t=+971.941599ms)
        tlogger.go:116: WARNING clientconn.go:1255 [core] [Channel #33 SubChannel #35] grpc: addrConn.createTransport failed to connect to {
              "Addr": "localhost:40765",
              "ServerName": "localhost:40765",
              "Attributes": null,
              "BalancerAttributes": null,
              "Type": 0,
              "Metadata": null
            }. Err: connection error: desc = "transport: authentication handshake failed: custom authz check fails"  (t=+972.136099ms)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #33 SubChannel #35] Subchannel Connectivity change to TRANSIENT_FAILURE  (t=+972.3059ms)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #33] Channel Connectivity change to TRANSIENT_FAILURE  (t=+972.4987ms)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #33 SubChannel #35] Subchannel Connectivity change to IDLE  (t=+1.973989745s)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #33] Channel Connectivity change to IDLE  (t=+1.974311345s)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #33 SubChannel #35] Subchannel Connectivity change to CONNECTING  (t=+1.974565746s)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #33] Channel Connectivity change to CONNECTING  (t=+1.974838946s)
        tlogger.go:116: INFO clientconn.go:1189 [core] [Channel #33 SubChannel #35] Subchannel picks a new address "localhost:40765" to connect  (t=+1.975067546s)
        tlogger.go:116: WARNING server.go:891 [core] [Server #32] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "ServerHandshake(\"127.0.0.1:43688\") failed: remote error: tls: bad certificate"  (t=+2.087152131s)
        tlogger.go:116: WARNING clientconn.go:1255 [core] [Channel #33 SubChannel #35] grpc: addrConn.createTransport failed to connect to {
              "Addr": "localhost:40765",
              "ServerName": "localhost:40765",
              "Attributes": null,
              "BalancerAttributes": null,
              "Type": 0,
              "Metadata": null
            }. Err: connection error: desc = "transport: authentication handshake failed: custom authz check fails"  (t=+2.087606231s)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #33 SubChannel #35] Subchannel Connectivity change to TRANSIENT_FAILURE  (t=+2.08778[303](https://github.com/grpc/grpc-go/runs/6496904034?check_suite_focus=true#step:8:304)2s)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #33] Channel Connectivity change to TRANSIENT_FAILURE  (t=+2.088030932s)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #33 SubChannel #35] Subchannel Connectivity change to IDLE  (t=+3.834015899s)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #33] Channel Connectivity change to IDLE  (t=+3.8342154s)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #33 SubChannel #35] Subchannel Connectivity change to CONNECTING  (t=+3.8344365s)
        tlogger.go:116: INFO clientconn.go:1189 [core] [Channel #33 SubChannel #35] Subchannel picks a new address "localhost:40765" to connect  (t=+3.8346556s)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #33] Channel Connectivity change to CONNECTING  (t=+3.834863501s)
        tlogger.go:116: WARNING server.go:891 [core] [Server #32] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "ServerHandshake(\"127.0.0.1:43690\") failed: remote error: tls: bad certificate"  (t=+3.939347372s)
        tlogger.go:116: WARNING clientconn.go:1255 [core] [Channel #33 SubChannel #35] grpc: addrConn.createTransport failed to connect to {
              "Addr": "localhost:40765",
              "ServerName": "localhost:40765",
              "Attributes": null,
              "BalancerAttributes": null,
              "Type": 0,
              "Metadata": null
            }. Err: connection error: desc = "transport: authentication handshake failed: custom authz check fails"  (t=+3.939523472s)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #33 SubChannel #35] Subchannel Connectivity change to TRANSIENT_FAILURE  (t=+3.939669973s)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #33] Channel Connectivity change to TRANSIENT_FAILURE  (t=+3.939825073s)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #33] Channel Connectivity change to SHUTDOWN  (t=+5.8647[306](https://github.com/grpc/grpc-go/runs/6496904034?check_suite_focus=true#step:8:307)34s)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #33 SubChannel #35] Subchannel Connectivity change to SHUTDOWN  (t=+5.864983234s)
        tlogger.go:116: INFO clientconn.go:1433 [core] [Channel #33 SubChannel #35] Subchannel deleted  (t=+5.865128535s)
        tlogger.go:116: INFO clientconn.go:1033 [core] [Channel #33] Channel deleted  (t=+5.865220935s)
        tlogger.go:116: INFO server.go:727 [core] [Server #32 ListenSocket #34] ListenSocket deleted  (t=+5.865612635s)
        --- FAIL: Test/End2End/test_client_custom_verification (5.00s)
            advancedtls_integration_test.go:391: client failed to connect to localhost:40765. Error: context deadline exceeded
        tlogger.go:116: INFO server.go:588 [core] [Server #36] Server created  (t=+5.866262237s)
        tlogger.go:116: INFO advancedtls_integration_test.go:115 [core] [Channel #37] Channel created  (t=+5.867013638s)
        tlogger.go:116: INFO server.go:776 [core] [Server #36 ListenSocket #38] ListenSocket created  (t=+5.8671[307](https://github.com/grpc/grpc-go/runs/6496904034?check_suite_focus=true#step:8:308)38s)
        tlogger.go:116: INFO clientconn.go:1555 [core] [Channel #37] original dial target is: "localhost:36509"  (t=+5.867318738s)
        tlogger.go:116: INFO clientconn.go:1562 [core] [Channel #37] parsed dial target is: {Scheme:localhost Authority: Endpoint:36509 URL:{Scheme:localhost Opaque:36509 User: Host: Path: RawPath: ForceQuery:false RawQuery: Fragment: RawFragment:}}  (t=+5.867480739s)
        tlogger.go:116: INFO clientconn.go:1575 [core] [Channel #37] fallback to scheme "passthrough"  (t=+5.867575239s)
        tlogger.go:116: INFO clientconn.go:1583 [core] [Channel #37] parsed dial target is: {Scheme:passthrough Authority: Endpoint:localhost:36509 URL:{Scheme:passthrough Opaque: User: Host: Path:/localhost:36509 RawPath: ForceQuery:false RawQuery: Fragment: RawFragment:}}  (t=+5.867704839s)
        tlogger.go:116: INFO clientconn.go:259 [core] [Channel #37] Channel authority set to "localhost:36509"  (t=+5.867808339s)
        tlogger.go:116: INFO resolver_conn_wrapper.go:175 [core] [Channel #37] Resolver state updated: {
              "Addresses": [
                {
                  "Addr": "localhost:36509",
                  "ServerName": "",
                  "Attributes": null,
                  "BalancerAttributes": null,
                  "Type": 0,
                  "Metadata": null
                }
              ],
              "ServiceConfig": null,
              "Attributes": null
            } (resolver returned new addresses)  (t=+5.868065839s)
        tlogger.go:116: INFO balancer_conn_wrappers.go:271 [core] [Channel #37] Channel switches to new LB policy "pick_first"  (t=+5.86833404s)
        tlogger.go:116: INFO clientconn.go:721 [core] [Channel #37 SubChannel #39] Subchannel created  (t=+5.86851344s)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #37 SubChannel #39] Subchannel Connectivity change to CONNECTING  (t=+5.868691641s)
        tlogger.go:116: INFO clientconn.go:1189 [core] [Channel #37 SubChannel #39] Subchannel picks a new address "localhost:36509" to connect  (t=+5.868879741s)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #37] Channel Connectivity change to CONNECTING  (t=+5.869067541s)
        tlogger.go:116: WARNING server.go:891 [core] [Server #36] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "ServerHandshake(\"127.0.0.1:49986\") failed: custom authz check fails"  (t=+6.094997[312](https://github.com/grpc/grpc-go/runs/6496904034?check_suite_focus=true#step:8:313)s)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #37 SubChannel #39] Subchannel Connectivity change to TRANSIENT_FAILURE  (t=+6.095160312s)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #37] Channel Connectivity change to TRANSIENT_FAILURE  (t=+6.095332813s)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #37 SubChannel #39] Subchannel Connectivity change to IDLE  (t=+7.096564957s)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #37] Channel Connectivity change to IDLE  (t=+7.096765757s)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #37 SubChannel #39] Subchannel Connectivity change to CONNECTING  (t=+7.096982758s)
        tlogger.go:116: INFO clientconn.go:1189 [core] [Channel #37 SubChannel #39] Subchannel picks a new address "localhost:36509" to connect  (t=+7.097183358s)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #37] Channel Connectivity change to CONNECTING  (t=+7.097372558s)
        tlogger.go:116: WARNING server.go:891 [core] [Server #36] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "ServerHandshake(\"127.0.0.1:49988\") failed: custom authz check fails"  (t=+7.[316](https://github.com/grpc/grpc-go/runs/6496904034?check_suite_focus=true#step:8:317)789119s)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #37 SubChannel #39] Subchannel Connectivity change to TRANSIENT_FAILURE  (t=+7.[317](https://github.com/grpc/grpc-go/runs/6496904034?check_suite_focus=true#step:8:318)002219s)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #37] Channel Connectivity change to TRANSIENT_FAILURE  (t=+7.317168919s)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #37 SubChannel #39] Subchannel Connectivity change to IDLE  (t=+8.60271402s)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #37] Channel Connectivity change to IDLE  (t=+8.602935721s)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #37 SubChannel #39] Subchannel Connectivity change to CONNECTING  (t=+8.60[321](https://github.com/grpc/grpc-go/runs/6496904034?check_suite_focus=true#step:8:322)9121s)
        tlogger.go:116: INFO clientconn.go:1189 [core] [Channel #37 SubChannel #39] Subchannel picks a new address "localhost:36509" to connect  (t=+8.603411821s)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #37] Channel Connectivity change to CONNECTING  (t=+8.603628822s)
        tlogger.go:116: WARNING server.go:891 [core] [Server #36] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "ServerHandshake(\"127.0.0.1:49990\") failed: custom authz check fails"  (t=+8.803121245s)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #37 SubChannel #39] Subchannel Connectivity change to TRANSIENT_FAILURE  (t=+8.80[336](https://github.com/grpc/grpc-go/runs/6496904034?check_suite_focus=true#step:8:337)3745s)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #37] Channel Connectivity change to TRANSIENT_FAILURE  (t=+8.803522746s)
        tlogger.go:116: INFO clientconn.go:414 [core] [Channel #37] Channel Connectivity change to SHUTDOWN  (t=+10.868666192s)
        tlogger.go:116: INFO clientconn.go:1078 [core] [Channel #37 SubChannel #39] Subchannel Connectivity change to SHUTDOWN  (t=+10.868895993s)
        tlogger.go:116: INFO clientconn.go:1433 [core] [Channel #37 SubChannel #39] Subchannel deleted  (t=+10.869041893s)
        tlogger.go:116: INFO clientconn.go:1033 [core] [Channel #37] Channel deleted  (t=+10.8691[348](https://github.com/grpc/grpc-go/runs/6496904034?check_suite_focus=true#step:8:349)93s)
        tlogger.go:116: INFO server.go:727 [core] [Server #36 ListenSocket #38] ListenSocket deleted  (t=+10.869458494s)
        --- FAIL: Test/End2End/TestServerCustomVerification (5.00s)
            advancedtls_integration_test.go:391: client failed to connect to localhost:[365](https://github.com/grpc/grpc-go/runs/6496904034?check_suite_focus=true#step:8:366)09. Error: context deadline exceeded

easwars avatar May 18 '22 21:05 easwars

https://github.com/grpc/grpc-go/runs/7140971630?check_suite_focus=true

easwars avatar Jul 06 '22 17:07 easwars

354 out of 10000 flakes when I repro'd this on google3: http://sponge2/99d10d2b-604a-4d19-9416-dbb96a2502f3

arvindbr8 avatar Nov 28 '22 19:11 arvindbr8

Hi @ZhenLian,

We're trying to deflake all our tests, and this one seems to be a pretty significant source of flakiness. Could you please take a look or reassign if appropriate? Thank you!

dfawley avatar Nov 29 '22 22:11 dfawley

Sure, I am on it now.

@arvindbr8 would you mind letting me know the command you reproduced this on g3, so that I can test on my own? Thank you so much in advance!

ZhenLian avatar Nov 30 '22 01:11 ZhenLian

I can successfully reproduce the error internally and mitigate the failures by extending the refreshing interval like https://github.com/grpc/grpc-go/pull/5277 , but it can not completely solve it... I guess the issue here is still, the behavior we are testing against is the file monitoring behavior itself. We can replace it with some fake functions to make sure the changes are picked up in the next round, but that was not exactly what we wanted to test against - it's best to make sure the monitoring behavior was working properly. I will delegate the call to @erm-g on how we want to proceed next. If we just want to mitigate the problem, it is simple; but if we want to do something more than that, it might add some complexity here...

ZhenLian avatar Dec 01 '22 00:12 ZhenLian

Can we adjust the test to not fail when the flaky behavior is seen? Or does that defeat the purpose of the test? Is it possible to retry until it succeeds such that it passes quickly when it's fast and takes just a small amount of time if it's a little slow, and doesn't fail?

The failure rate is high enough here that I'd appreciate if you could prioritize some kind of fix here.

dfawley avatar Dec 06 '22 18:12 dfawley

I would be OOO for a couple of weeks, but I totally understand that this is a critical thing that highly affects the testing infrastructure. @erm-g Can you please take a look and see what could be the possible solutions here? I think both ways(not failing and retrying) pointed by Doug would worth trying. Thank you in advance!

ZhenLian avatar Dec 08 '22 22:12 ZhenLian

@erm-g, friendly ping. Let us know how we can help

arvindbr8 avatar Feb 28 '23 17:02 arvindbr8

I told @erm-g that I'd take a look at this. One way I can think of to solve the flakiness fundamentally is to fake the Ticker in the watcher. We could expose a Ticker field in pemfile.Options{} so that the watcher will use it if not nil or fall back to the default time.NewTicker(). Certainly, having such a public field means users could also use it to define the watching behavior. Would this be acceptable? @arvindbr8

Also, how could we reproduce this locally? Running advancedtls_integration_test.go takes about 4 seconds and 10k runs (or even 1k) is not quite scalable on a local machine.

rockspore avatar Mar 06 '23 21:03 rockspore

One way I can think of to solve the flakiness fundamentally is to fake the Ticker in the watcher. We could expose a Ticker field in pemfile.Options{} so that the watcher will use it if not nil or fall back to the default time.NewTicker(). Certainly, having such a public field means users could also use it to define the watching behavior. Would this be acceptable?

Does it need to be public? If your tests are in the same package, then it doesn't have to be. But even if they aren't, you can still do what you want without exporting anything new. Typically the way to do something like this would look like:

package foo

import "...../foo/internal"

struct Foo {
	bar int // hidden field tests should be able to set
}

internal.setFooBar = func(f *Foo, v int) { f.bar = v }

---

package internal

// (Cannot import "..../foo".)

var setFooBar interface{}

---

package foo_test

// imports foo and foo/internal

var setFooBar = internal.setFooBar.(func(*foo.Foo, int))

func TestFooBar(t *testing.T) {
	f := &foo.Foo{}
	setFooBar(f, 7)
	// f.bar == 7 now
}

dfawley avatar Mar 08 '23 23:03 dfawley

The test is in a different pkg so it's tricky.

Nevertheless, my assessment was wrong. The flakiness isn't from the real clock. #6093 should provide the fix with very little change. :)

rockspore avatar Mar 08 '23 23:03 rockspore

Closing this since #6093 has been merged.

rockspore avatar Mar 09 '23 20:03 rockspore