drill
drill copied to clipboard
Decouple Concurrency and Iterations
From #83:
Concurrency and iterations are linked so there's no way to, say, run a large CSV file a single time with currency > 1 other than by using something like GNU Parallel to run multiple copies — (since drill is so simple to run that's not actually a bad option). It feels like it would be useful to have a way to expand the with_items directives so you could decouple those two settings and it'd be legal to, for example, say concurrency: 100 iterations:1 when you have a large input file so you'd still be able to things like the stats which wouldn't otherwise display if you terminated a large job before it finished many millions of requests.
Use Case: Have a list of 10k parameters to run against a url to validate they return 200. Want each item in the list to be run once, but run with concurrency of 20.
concurrency: 20
iterations: 1 # only want this to run once
base: 'http://localhost'
plan:
- name: Fetch by id
request:
url: /{{ item.email }}
with_items_from_csv: ./items.csv
id
1
2
3
...
The goal of the iterations parameter is to execute this plan N times. All the steps in this plan are executed sequentially. On the other hand, the goal of the concurrency parameter is to execute those iterations in parallel up to M executions at the same time. So, concurrency values higher than iterations value doesn't make sense. It will execute the up to the lower value between them at the same time.
In your case, the Fetch by id step is processed atomically by the executor.
I think I see. So you're saying that a plan item is sent to an executor to be executed, and a plan item is executed serially. And the with_items_from_csv property is iterated on within an executor.
Is there a way to generate an independent executor for each item in a CSV file? So a large CSV file could be processed quickly. Or how feasible would it be to make a change to support something like that?
If I'm understanding the code correctly, there are 1 to n iterations, and the concurrency controls how many iterations can run at once.
But, within an iteration, a step is evaluated, and in the case of the mutli-csv-request there is a step created for each row in the CSV file (Code here). Each step is done sequentially, which is controlled here in a simple for loop.
Would it not be possible to do all the 'benchchmark executes' in parallel? It looks like it would 'just' require a flag to control whether the user wanted sequential or parallel processing, then the sequential would execute as today, and the parallel would use something like rayon(?) to execute the actions??
Sorry for the late answer. I have been really busy these weeks.
The problem here is that the benchmark is executed sequentially because actions can have dependencies with previous actions, like storing / using variables.
On the other hand, one thing we can do is to execute all mutli-csv-request rows in parallel if you want to execute the action in parallel.