wasmcloud-otp
wasmcloud-otp copied to clipboard
[BUG] Timeouts for Genserver calls to wasmex don't reflect host RPC_TIMEOUT setting
If an actor's rpc call to a provider takes longer than 5 seconds, the actor OTP process is terminated and we get a proc_lib.crash_report
(log below). The timeout should be increased if the environment value WASMCLOUD_RPC_TIMEOUT_MS is greater than 5000 ms .
I thought I heard in conversation that there is a default hard-coded 5000 ms limit for all genserver calls. If there's a separate limit for calls into wasmex, that should also be adjusted.
{"host_id":"NADULV4OGGOZ3LBZLT2CSDR2KIWFAFWRVI4ZUAOW6M627QIV2AVH4NZC","lattice_id":"default","level":"error","mfa":":proc_lib.crash_report/4","pid":"#PID<0.2031.0>","time":"2022-05-11T01:3 9: 21.292068Z","msg":"reports are unused"}^M
{
"level":"error",
"log_error":"%FunctionClauseError {
args: nil,
arity: 1,
clauses: nil,
function: :encode_meta,
kind: nil,
module: HostCore.StructuredLogger.FormatterJson
}",
"meta":"%{
domain: [ :otp, :sasl],
error_logger: %{
tag: :error_report,
type: :supervisor_report
},
gl: #PID<0.1822.0>,
logger_formatter: %{
title: \"SUPERVISOR REPORT\"
},
pid: #PID<0.1847.0>,
report_cb: &:logger.format_otp_report/1,
time: 1652233161292545
}",
"msg":"{
:report,
%{
label: {
:supervisor,
:child_terminated
},
report: [
supervisor: {
:local,
HostCore.Actors.ActorSupervisor
},
errorContext: :child_terminated,
reason: {
:timeout,
{
GenServer,
:call,
[
#PID<0.2033.0>,
{
:call_function,
\"__guest_call\",
[24, 41376]
},
5000
]
}
},
offender: [
pid: #PID<0.2031.0>,
id: :undefined,
mfargs: {
HostCore.Actors.ActorModule,
:start_link,
[
{
%{
__struct__: HostCore.WasmCloud.Native.Claims,
call_alias: nil,
caps: [\"wasmcloud:mlinference\", \"wasmcloud:httpserver\", \"wasmcloud:builtin :logging\"],
expires_human: \"never\",
issuer: \"ACGQLMIPCGV2JTBBG4AGUDYSEFSBOR3YFZPDTMGCEGMNSWPYYDKWUDEY\",
name: \"fingerprint-api\",
not_before_human: \"immediately\",
public_key: \"MAOIUSTCRPDZBPQ3EZHSGX2EIWZGT445QPU7J7EU3AMI7AYH6Z2ROJEH\",
revision: 1652231790,
tags: [],
version: \"0.1.1\"
},
<<0, 97, 115, 109, 1, 0, 0, 0, 1, 178, 3, 58, 96, 1, 127, 0, 96, 3, 127, 127, 127, 0, 96, 1, 127, 1, 127, 96, 2, 127, 125, 1, 125, ...>>,
\"127.0.0.1:5000/fingerprint_api:0.1.1\"
}
]
},
restart_type: :transient,
shutdown: 5000,
child_type: :worker
]
]
}
}"
}
Not sure if this qualifies as funny
but the function clause error is actually coming from our structured logger attempting to parse and format the timeout error from the actor module.
This can now be done with an update to wasmex https://github.com/tessi/wasmex/releases/tag/v0.7.1
Reopening since we should still update the host to send a timeout, which wasmex now supports
Not applicable anymore