nomad icon indicating copy to clipboard operation
nomad copied to clipboard

Low throughput of batch jobs allocations on clients

Open cqueinnec opened this issue 2 years ago • 5 comments

Nomad version

  • Servers: 1.3.1
  • Client: 1.3.4

Operating system and Environment details

  • 3 Servers: CentOS
  • 1 Client: Windows Server 2019
    • 8 cores, 24GHz
    • 32GB of RAM
    • SSD drive

Issue

We're running Nomad since a few years with small volumes of batch jobs. Now I'm analyzing how Nomad behaves when running big number of batch jobs, of different natures.

My first experiment is to see how many tasks can be dispatched on a node, and how fast they are processed. For this, I'm running 1 very simple job with count=1000 - my first try of creating 1000 jobs or more didn't go so well due to memory management and state replication, but that's another story. Problem: I'd expect those tasks to be processed in a few seconds, considering the client has more than enough resource to consume tasks concurrently. But instead, they are processed by small batches of 2 to 6 tasks in parallel.

I've tried to fiddle with configuration on GC management (client and server side), I took a look at c2m and other resources on the Internet, but to no avail. I also want to mention that I've read https://github.com/hashicorp/nomad/issues/13933 which is a gold mine of informations on how to dispatch huge amount of batch jobs. I'd love to see an advanced section in the docs, explaining how to properly configure servers and clients to achieve high volumes of processing :)

Any help or guidelines would be appreciated. Thanks!

Reproduction steps

Run below job file, and look at the processing on the UI.

Expected Result

Considering how small the task is (executing an echo "Hello World!" with very small resources) I'd expect allocs te be dispatched by hundreds.

Actual Result

Allocations are created and processed little by little. I can witness from 2 to 6 allocs being executed concurrently on the client:

2022-08-26_11h47_19

To me, the problem doesn't seem to be located on the scheduler, because in the UI I can see that the client has already been determined a few seconds after the job starts:

image

Job file (if appropriate)

Job file, started with Levant (levant.exe deploy -force -force-count .\nomad-job-hello-world--one-group.hcl):

job "hello-world-[[ timeNowUTC ]]-[[ uuidv4 ]]" {
    datacenters = ["dc1"]
    type = "batch"

    constraint {
        attribute = "${node.unique.name}"
        operator  = "="
        value     = "my-nomad-client"
    }

    group "group" {
        count = 1000
        restart {
            attempts = 0
            mode = "fail"
        }
        reschedule {
            attempts  = 0
            unlimited = false
        }

        task "ping" {
            driver = "raw_exec"
            config {
                command = "C:\\Windows\\System32\\cmd.exe"
                args = ["/c echo 'Hello World!'"]
            }

            resources {
                cpu    = 20
                memory = 20
            }
            
            logs {
                max_files     = 1
                max_file_size = 5
            }
        }

        ephemeral_disk {
            size    = 20
        }
    }
}

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

cqueinnec avatar Aug 26 '22 16:08 cqueinnec

Hi @cqueinnec 👋

Given that all allocations were created and placed in the pending state I would say that the servers are performing as expected, so the client may be the bottleneck.

When you click in those pending allocations, then in the task (ping in your sample job), what task events do you see? There should be a table like this:

image

This could show what those tasks are waiting for.

lgfa29 avatar Sep 02 '22 18:09 lgfa29

Hello,

Thanks for your feedback! Here's a video showing that for a task received 2 minutes ago (and for which the client seem to already have been determined), as long as the status is pending there is no task for the allocation. Once it starts, the events table shows that the whole process is performed in 3 seconds or so.

2022-09-07_06h50_08

It really feels like the client takes some time to grab the task, even though there's plenty of resources left. Let me know if there's a way to get more details. Thanks!

cqueinnec avatar Sep 07 '22 10:09 cqueinnec

Could this be related to the number of available threads on the machine? Or is client execution not related to available threads (considering it's Go, it might just use go routines)?

cqueinnec avatar Oct 04 '22 13:10 cqueinnec

@cqueinnec can you provide a still screenshot (or CLI output) of the Task Events, rather than a gif?

tgross avatar Oct 04 '22 13:10 tgross

Here's the Recent Events table for a single task. As mentioned, once the task starts, the events table shows that the whole process is performed in very few (2 or 3) seconds.

image

cqueinnec avatar Oct 05 '22 23:10 cqueinnec