backend.ai icon indicating copy to clipboard operation
backend.ai copied to clipboard

fix: Replace `roundrobin` flag with `AgentSelectionStrategy.RoundRobin` strategy

Open jopemachine opened this issue 2 years ago • 0 comments

Checklist: (if applicable)

  • [x] Milestone metadata specifying the target backport version
  • [x] Mention to the original issue

Follow up PR of #1405. This PR performs overall revamps related AgentSelectionStrategy things.

Details:

  • Replace the roundrobin flag with AgentSelectionStrategy.RoundRobin strategy
  • Simplify the roundrobin agent selection logic when joined agent list changes.
  • Move agent_selection_resource_priority into AbstractScheduler to avoid passing it all related functions each time.
  • Fix wrong type declaration (Replace KernelInfo with KernelRow of assign_agent_for_kernel)
  • Abstract agent selection logic into AbstractScheduler.

How to test

To test the RoundRobin strategy,

  • To use the RoundRobin policy for agent_selection_strategy, change the agent_selection_strategy in scaling_groups.scheduler_opts to roundrobin.
  • Update the redis address in etcd by ./backend.ai mgr etcd put config/redis/addr <manager_addr>:8111
  • Update the rpc-listen-addr, id of the agent section in each agent's agent.toml.
  • Checks if all agents are properly connected to the manager in the Backend.AI web UI.
  • Create sessions to verify that sessions are evenly distributed among the agents and that the next_index for round_robin state is correctly updated in etcd.

jopemachine avatar Oct 25 '23 07:10 jopemachine