rally
rally copied to clipboard
Prevent unintentional specification of task properties on the operation definition
Since #326 we allow to define operations inline with its enclosing task. The operation defines what API call should be executed and the task defines how it should be executed (e.g. warmup-iterations
, iterations
, target-throughput
). In order to save users some hassle, all of the task properties have default values. Additionally, the operation definition allows to define arbitrary parameters. This leads to a situation where a user can define this task:
{
"operation": {
"name": "query-match-car",
"operation-type": "search",
...,
"clients": 2,
"warmup-iterations": 1000,
"iterations": 10000,
"target-throughput": 100
}
}
when they should have done this
{
"operation": {
"name": "query-match-car",
"operation-type": "search",
...
},
"clients": 2,
"warmup-iterations": 1000,
"iterations": 10000,
"target-throughput": 100
}
In the first case, the properties are attributed to the operation and the task will run with defaults (one client, no warmup iteration, one measurement iteration, no target throughput). This is trappy and surprises users.
We have several ways to address this:
Change track file format
We can change the track file format so task properties need to be defined in their own block, e.g.:
{
"operation": {
"name": "query-match-car",
"operation-type": "search",
...
},
"task": {
"clients": 2,
"warmup-iterations": 1000,
"iterations": 10000,
"target-throughput": 100
}
}
We'd also not define default values and require users to specify properties explicitly.
Make task properties mandatory
A more light-weight approach is to be more strict, remove the default values and require users to specify explicit values. We can prepare the official Rally tracks in advance to conform to that requirement and so this would not affect the majority of the users.
Detect problematic situations
Another option is to detect that the user has specified task-related properties on the operation (but none on the task) and warn the user about it. The problem with this approach is that we can never be sure that a user has intentionally passed the task-related properties. We could declare them as reserved names though. I am only mentioning this possibility for completeness but I think this approach is trappy in itself.
I am favor of option two but this is open for discussion.
We discussed this in our sync and came up with a modification of proposal number one defined above. Instead of introducing a separate structure for task
we'll introduce a separate structure for schedule
.
Here is a summary of the changes:
- A
Task
is only allowed to have a set of well-defined properties. Any other property will be rejected. - The
schedule
property will turn from a string (that defines the name of the scheduler) into an object. All scheduler-related properties will move into that object. Theschedule
object will allow arbitrary properties so custom scheduler implementations can specify properties as they see fit. - For simplicity we will allow to define the property
target-throughput
- which actually belongs to theschedule
also on task level - if and only if there is noschedule
property defined (i.e. we use the default schedule implicitly). - To ensure users don't specify keys on the wrong level, all task property names will be treated as reserved names within the
operation
andschedule
properties. Using such a property name onoperation
orschedule
level will be rejected.
Examples
This task will continue to work as is as there is no schedule
property set.
{
"operation": {
"name": "query-match-car",
"operation-type": "search",
...
},
"clients": 2,
"warmup-iterations": 1000,
"iterations": 10000,
"target-throughput": 100
}
This task will be rejected because the clients
property is specified on operation level (which is only allowed as a task property):
{
"operation": {
"name": "query-match-car",
"operation-type": "search",
"clients": 2,
...
},
"warmup-iterations": 1000,
"iterations": 10000,
"target-throughput": 100
}
The following task will need to change (target-throughput is defined on task level but a custom schedule is set):
{
"operation": {
"name": "query-match-car",
"operation-type": "search",
...
},
"clients": 2,
"warmup-iterations": 1000,
"iterations": 10000,
"schedule": "poisson",
"target-throughput": 100
}
Instead the task needs to be specified as:
{
"operation": {
"name": "query-match-car",
"operation-type": "search",
...
},
"clients": 2,
"warmup-iterations": 1000,
"iterations": 10000,
"schedule": {
"name": "poisson",
"target-throughput": 100
}
}