keria icon indicating copy to clipboard operation
keria copied to clipboard

Keria is unresponsive for ~5s when booting an agent

Open lenkan opened this issue 9 months ago • 4 comments

Steps to reproduce:

  1. Agent 1: Boot agent
  2. Agent 1: Create aid
  3. Agent 1: Authorize agent end role
  4. Agent 2: Boot agent and immediately after try to call http://:3902/oobi/.

The http call for the oobi takes ~5 seconds to complete because the agent is busy with booting agent 2.

I haven't looked at the internals of the boot implementation, but perhaps these types of tasks should be forked to a subprocess. In a multi-tenant setup, when a new agent is booted, it will affect all other clients in the same instance.

Any thoughts?

lenkan avatar Mar 27 '25 13:03 lenkan

I haven't looked into it properly but the boot endpoint returns after like 400-500ms consistently in my local though, at which point everything is set up.

Would be interesting to see what's causing it to bottleneck.

iFergal avatar Mar 27 '25 15:03 iFergal

Ah. Yes. I should have mentioned. The setup was

  • AWS ECS Container running on AWS Fargate with 1vCPU and 2GB memory
  • AWS EFS in same subnet

So surely that has an impact. There is going to be latency for the EFS. But that is IO bound, so it should technically be possible to receive requests anyway.

like 400-500ms consistently

That is acceptable I suppose. Although, if the boot call is blocking, then that is an issue anyway.

lenkan avatar Mar 27 '25 16:03 lenkan

My thoughts: when a new wallet is booted, KERIA creates the new hab and loads the KERIA config file. If the config file has many iurls, for witnesses and schemas for example, several async processes are launched to resolve those OOBIs. I can't say that this is the cause of your problem, but at least we've experienced that the step 2 of creating an AID with witnesses fails because witnesses are still unknown.

rodolfomiranda avatar Mar 27 '25 18:03 rodolfomiranda

Dev call:

Sam: Probably waiting for the new agent to boot before yielding back any time to the main Doist loop.

Fergal: it is a synchronous call, to /boot that sets up the hab.

Daniel: when using one agent during boot of another agent then the first agent was going slow.

Sam: solution may be to add yielding, or possibly multiprocess.

kentbull avatar Apr 01 '25 14:04 kentbull