baseball
baseball copied to clipboard
API: Add validation to SubmitJob API
Before successfully writing to DynamoDB and placing a message on the queue, the SubmitJob API call should validate the parameters of the requested job.
Here is a sample JSON configuration object for a job:
{
"dataset": "Lahman_Batting",
"transformations": [
{
"type": "columnSelect",
"columns": [
"HR",
"lgID"
]
},
{
"type": "rowSelect",
"column": "yearID",
"operator": ">=",
"criteria": "2000"
},
{
"type": "columnDefine",
"column": "custom",
"expression": "2*(HR)"
},
{
"type": "rowSum",
"columns": [
"playerID",
"yearID",
"lgID"
]
}
],
"output": {
"type": "leaderboard",
"column": "HR",
"direction": "desc"
}
}
Below is a list of required validations.
Dataset:
- Dataset ID should be from set of allowed set of datasets (currently just "Lahman_Batting")
Output:
- Output parameter "type" should be from allowed set of output types (currently just "leaderboard")
- Output parameter "column" should be the name of a single column from the set of selected and/or defined columns as of the end of all transformations
- Output parameter direction must be one of "desc" or "asc"
ColumnSelect and RowSum Transformation:
- Entries in the "columns" list should be the name of an existing column, with respect to any previously executed transformations.
- After the ColumnSelect transformation, all columns not present in the "columns" list are lost.
- After the RowSum transformation, all string-valued columns not present in the "columns" list are lost.
RowSelect Transformation:
- "column" should be the name of an existing column, with respect to any previously executed transformations.
- "operator" should be one of <, >, <=, >=, =, or !=.
- "criteria" should be either a number or string, and not an expression.
- The type of the criteria (number or string) should match the type of the corresponding column chosen.
ColumnDefine Transformation:
- "column" should be a unique name for the new column being defined, and should not conflict with the name of any existing column, with respect to any previously executed transformations
- "expression" should be a valid mathematical expression using only scalar values (strings or numbers) or the names of existing columns, with respect to any previously executed transformations.
- "expression" may use the following numerical operators: +, -, *, /, ^
- After the ColumndDefine transformation, a new column with the given name is added.
Checking the column definition expressions is the hardest part of this. I'm using the pyparsing module (http://pyparsing.wikispaces.com/) to write a Python class with the necessary logic.
Check out https://github.com/bryantrobbins/baseball/blob/master/shared/btr3baseball/ExpressionValidator.py
The Configuration Validator (top-level) is Here: https://github.com/bryantrobbins/baseball/blob/master/shared/btr3baseball/ConfigValidator.py
TODO: Add a list here of possible exceptions thrown by the ConfigValidator for consumption by the UI and Worker