wasmcloud-otp icon indicating copy to clipboard operation
wasmcloud-otp copied to clipboard

[BUG] Timeouts for Genserver calls to wasmex don't reflect host RPC_TIMEOUT setting

Open stevelr opened this issue 2 years ago • 3 comments

If an actor's rpc call to a provider takes longer than 5 seconds, the actor OTP process is terminated and we get a proc_lib.crash_report (log below). The timeout should be increased if the environment value WASMCLOUD_RPC_TIMEOUT_MS is greater than 5000 ms .

I thought I heard in conversation that there is a default hard-coded 5000 ms limit for all genserver calls. If there's a separate limit for calls into wasmex, that should also be adjusted.

{"host_id":"NADULV4OGGOZ3LBZLT2CSDR2KIWFAFWRVI4ZUAOW6M627QIV2AVH4NZC","lattice_id":"default","level":"error","mfa":":proc_lib.crash_report/4","pid":"#PID<0.2031.0>","time":"2022-05-11T01:3  9:  21.292068Z","msg":"reports are unused"}^M
{
    "level":"error",
    "log_error":"%FunctionClauseError {
        args: nil,
        arity: 1,
        clauses: nil,
        function: :encode_meta,
        kind: nil,
        module: HostCore.StructuredLogger.FormatterJson
    }",
    "meta":"%{
        domain:   [  :otp, :sasl],
        error_logger: %{
            tag: :error_report,
            type: :supervisor_report
        },
        gl: #PID<0.1822.0>,
        logger_formatter: %{
            title: \"SUPERVISOR REPORT\"
        },
        pid: #PID<0.1847.0>,
        report_cb: &:logger.format_otp_report/1,
        time: 1652233161292545
    }",
    "msg":"{
        :report,
        %{
            label: {
                :supervisor,
                :child_terminated
            },
            report: [
                supervisor: {
                    :local,
                    HostCore.Actors.ActorSupervisor
                },
                errorContext: :child_terminated,
                reason: {
                    :timeout,
                    {
                        GenServer,
                        :call,
                        [
                            #PID<0.2033.0>,
                            {
                                :call_function,
                                \"__guest_call\",
                                [24, 41376]
                            },
                            5000
                        ]
                    }
                },
                offender: [
                    pid: #PID<0.2031.0>,
                    id: :undefined,
                    mfargs: {
                        HostCore.Actors.ActorModule,
                        :start_link,
                        [
                            {
                                %{
                                    __struct__: HostCore.WasmCloud.Native.Claims,
                                    call_alias: nil,
                                    caps: [\"wasmcloud:mlinference\", \"wasmcloud:httpserver\", \"wasmcloud:builtin  :logging\"],
                                    expires_human: \"never\",
                                    issuer: \"ACGQLMIPCGV2JTBBG4AGUDYSEFSBOR3YFZPDTMGCEGMNSWPYYDKWUDEY\",
                                    name: \"fingerprint-api\",
                                    not_before_human: \"immediately\",
                                    public_key: \"MAOIUSTCRPDZBPQ3EZHSGX2EIWZGT445QPU7J7EU3AMI7AYH6Z2ROJEH\",
                                    revision: 1652231790,
                                    tags: [],
                                    version: \"0.1.1\"
                                },
                                <<0, 97, 115, 109, 1, 0, 0, 0, 1, 178, 3, 58, 96, 1, 127, 0, 96, 3, 127, 127,   127, 0, 96, 1,   127, 1, 127, 96, 2, 127, 125, 1, 125, ...>>,
                                \"127.0.0.1:5000/fingerprint_api:0.1.1\"
                            }
                        ]
                    },
                    restart_type: :transient,
                    shutdown: 5000,
                    child_type: :worker
                ]
            ]
        }
    }"
}

stevelr avatar May 11 '22 17:05 stevelr

Not sure if this qualifies as funny but the function clause error is actually coming from our structured logger attempting to parse and format the timeout error from the actor module.

autodidaddict avatar May 11 '22 17:05 autodidaddict

This can now be done with an update to wasmex https://github.com/tessi/wasmex/releases/tag/v0.7.1

brooksmtownsend avatar May 25 '22 19:05 brooksmtownsend

Reopening since we should still update the host to send a timeout, which wasmex now supports

connorsmith256 avatar Jun 27 '22 17:06 connorsmith256

Not applicable anymore

brooksmtownsend avatar May 10 '23 16:05 brooksmtownsend