flame icon indicating copy to clipboard operation
flame copied to clipboard

Fly FLAME failing with timeout

Open Doerge opened this issue 10 months ago • 0 comments

Hi,

I attempted a simple Flame example in an existing app, but I'm running into an issue where the FLAME instance is killed immediately, but the caller sees a timeout.

Versions:

elixir=1.16.2
erlang-otp=25.3.2.10
{:flame, "~> 0.1.12"}

I tried to follow the doc's and setup:

# application.ex
        {FLAME.Pool,
         name: MyApp.FFMpegRunner,
         min: 0,
         max: 2,
         max_concurrency: 1,
         idle_shutdown_after: 30_000,
         timeout: :timer.minutes(30)},
# runtime.exs
  pool_size =
    if FLAME.Parent.get() do
      1
    else
      String.to_integer(System.get_env("POOL_SIZE") || "10")
    end

  config :loccal, MyApp.Repo,
    url: database_url,
    pool_size: pool_size,
    socket_options: maybe_ipv6

  config :flame, :backend, FLAME.FlyBackend

  config :flame, FLAME.FlyBackend,
    token: System.fetch_env!("FLAME_FLY_API_TOKEN"),
    cpus: 2,
    memory_mb: 8 * 1024,
    boot_timeout: 120_000

Then run a simple example in a remote Fly shell:

iex(myapp@908055ec2195e8)1> FLAME.call(MyApp.FFMpegRunner, fn -> {:ok, "Foo"} end)
** (exit) exited in: FLAME.Pool.call(MyApp.FFMpegRunner, #Function<43.3316493/0 in :erl_eval.expr/6>, [])
    ** (EXIT) time out
    (flame 0.1.12) lib/flame/pool.ex:268: FLAME.Pool.exit!/3
    iex:1: (file)
iex(myapp@908055ec2195e8)1>

I tried setting the timeouts really high, because I see that in the logs below. It appears the FLAME instance is spawned, and attempts to run, but is then shut down.

One suspicious thing is that the endpoint in the logs in the 9185997ade1258 instance is https://example.com, while my app is the real domain in my shell:

iex(myapp@908055ec2195e8)3> Application.get_env(:loccal, LoccalWeb.Endpoint)[:url]
[host: "myapp.com", port: 443, scheme: "https"]

(I disabled selectively starting applications in the Flame instance, for rule out having made a mistake there)

Any idea of what I could be missing here?

Logs:

# Logging into a remote shell here, and executing the simple FLAME call
2024-04-23T15:00:20Z app[908055ec2195e8] cdg [info]2024/04/23 15:00:20 New SSH Session - XXXXXX
2024-04-23T15:00:37Z app[908055ec2195e8] cdg [info] WARN Reaped child process with pid: 560 and signal: SIGUSR1, core dumped? false
2024-04-23T15:02:22Z runner[9185997ade1258] cdg [info]Pulling container image registry.fly.io/my-app-8571:deployment-01HW5Q7P881CZB6TQ747W188PK
2024-04-23T15:02:23Z runner[9185997ade1258] cdg [info]Successfully prepared image registry.fly.io/my-app-8571:deployment-01HW5Q7P881CZB6TQ747W188PK (1.058413437s)
2024-04-23T15:02:26Z runner[9185997ade1258] cdg [info]Configuring firecracker
2024-04-23T15:02:26Z app[9185997ade1258] cdg [info][    0.230064] PCI: Fatal: No config space access function found
2024-04-23T15:02:27Z app[9185997ade1258] cdg [info] INFO Starting init (commit: 65db7f7)...
2024-04-23T15:02:27Z app[9185997ade1258] cdg [info] INFO Preparing to run: `/app/bin/server` as nobody
2024-04-23T15:02:27Z app[9185997ade1258] cdg [info] INFO [fly api proxy] listening at /.fly/api
2024-04-23T15:02:27Z app[9185997ade1258] cdg [info]2024/04/23 15:02:27 listening on [fdaa:0:31ed:a7b:ae02:ad7e:20b7:2]:22 (DNS: [fdaa::3]:53)
2024-04-23T15:02:27Z runner[9185997ade1258] cdg [info]Machine created and started in 5.019s
2024-04-23T15:02:33Z app[9185997ade1258] cdg [info]15:02:33.201 [info] Running MyAppWeb.Endpoint with cowboy 2.12.0 at :::4000 (http)
2024-04-23T15:02:33Z app[9185997ade1258] cdg [info]15:02:33.204 [info] Access MyAppWeb.Endpoint at https://example.com
2024-04-23T15:02:33Z app[9185997ade1258] cdg [info] WARN Reaped child process with pid: 382 and signal: SIGUSR1, core dumped? false
2024-04-23T15:02:38Z app[9185997ade1258] cdg [info]15:02:38.178 [info] Tzdata has updated the release from 2021e to 2024a
2024-04-23T15:02:53Z app[9185997ade1258] cdg [info] WARN Reaped child process with pid: 384 and signal: SIGUSR1, core dumped? false
2024-04-23T15:02:53Z app[9185997ade1258] cdg [info] WARN Reaped child process with pid: 385 and signal: SIGUSR1, core dumped? false
2024-04-23T15:03:40Z app[9185997ade1258] cdg [info] INFO Main child exited normally with code: 0
2024-04-23T15:03:40Z app[9185997ade1258] cdg [info] INFO Starting clean up.
2024-04-23T15:03:40Z app[9185997ade1258] cdg [info] WARN could not unmount /rootfs: EINVAL: Invalid argument
2024-04-23T15:03:40Z app[9185997ade1258] cdg [info][   73.750104] reboot: Restarting system
2024-04-23T15:03:40Z runner[9185997ade1258] cdg [info]machine restart policy set to 'no', not restarting
2024-04-23T15:04:21Z app[908055ec2195e8] cdg [info]15:04:21.071 [error] failed to connect to fly machine within 120000ms
2024-04-23T15:04:21Z app[908055ec2195e8] cdg [info]15:04:21.072 [error] GenServer #PID<0.6714.0> terminating
2024-04-23T15:04:21Z app[908055ec2195e8] cdg [info]** (stop) time out
2024-04-23T15:04:21Z app[908055ec2195e8] cdg [info]    (flame 0.1.12) lib/flame/fly_backend.ex:249: FLAME.FlyBackend.remote_boot/1
2024-04-23T15:04:21Z app[908055ec2195e8] cdg [info]    (flame 0.1.12) lib/flame/runner.ex:270: anonymous fn/4 in FLAME.Runner.handle_call/3
2024-04-23T15:04:21Z app[908055ec2195e8] cdg [info]    (stdlib 4.3.1.3) gen_server.erl:1149: :gen_server.try_handle_call/4
2024-04-23T15:04:21Z app[908055ec2195e8] cdg [info]    (stdlib 4.3.1.3) gen_server.erl:1178: :gen_server.handle_msg/6
2024-04-23T15:04:21Z app[908055ec2195e8] cdg [info]    (stdlib 4.3.1.3) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
2024-04-23T15:04:21Z app[908055ec2195e8] cdg [info]Last message (from #PID<0.6713.0>): {:remote_boot, nil}
2024-04-23T15:04:21Z app[908055ec2195e8] cdg [info]15:04:21.073 [error] Task #PID<0.6713.0> started from MyApp.FFMpegRunner terminating
2024-04-23T15:04:21Z app[908055ec2195e8] cdg [info]** (stop) exited in: GenServer.call(#PID<0.6714.0>, {:remote_boot, nil}, :infinity)
2024-04-23T15:04:21Z app[908055ec2195e8] cdg [info]    ** (EXIT) time out
2024-04-23T15:04:21Z app[908055ec2195e8] cdg [info]    (elixir 1.16.2) lib/gen_server.ex:1114: GenServer.call/3
2024-04-23T15:04:21Z app[908055ec2195e8] cdg [info]    (flame 0.1.12) lib/flame/pool.ex:565: FLAME.Pool.start_child_runner/2
2024-04-23T15:04:21Z app[908055ec2195e8] cdg [info]    (elixir 1.16.2) lib/task/supervised.ex:101: Task.Supervised.invoke_mfa/2
2024-04-23T15:04:21Z app[908055ec2195e8] cdg [info]    (elixir 1.16.2) lib/task/supervised.ex:36: Task.Supervised.reply/4
2024-04-23T15:04:21Z app[908055ec2195e8] cdg [info]Function: #Function<1.15317371/0 in FLAME.Pool.async_boot_runner/1>
2024-04-23T15:04:21Z app[908055ec2195e8] cdg [info]    Args: []
2024-04-23T15:04:33Z app[908055ec2195e8] cdg [info]15:04:33.720 [info] CONNECTED TO Phoenix.LiveView.Socket in 43µs
2024-04-23T15:04:33Z app[908055ec2195e8] cdg [info]  Transport: :websocket

Doerge avatar Apr 23 '24 15:04 Doerge