baseball icon indicating copy to clipboard operation
baseball copied to clipboard

API: Add validation to SubmitJob API

Open bryantrobbins opened this issue 8 years ago • 3 comments

Before successfully writing to DynamoDB and placing a message on the queue, the SubmitJob API call should validate the parameters of the requested job.

Here is a sample JSON configuration object for a job:

{
  "dataset": "Lahman_Batting",
  "transformations": [
    {
      "type": "columnSelect",
      "columns": [
        "HR",
        "lgID"
      ]
    },
    {
      "type": "rowSelect",
      "column": "yearID",
      "operator": ">=",
      "criteria": "2000"
    },
    {
      "type": "columnDefine",
      "column": "custom",
      "expression": "2*(HR)"
    },
    {
      "type": "rowSum",
      "columns": [
        "playerID",
        "yearID",
        "lgID"
      ]
    }
  ],
  "output": {
    "type": "leaderboard",
    "column": "HR",
    "direction": "desc"
  }
}

Below is a list of required validations.

Dataset:

  • Dataset ID should be from set of allowed set of datasets (currently just "Lahman_Batting")

Output:

  • Output parameter "type" should be from allowed set of output types (currently just "leaderboard")
  • Output parameter "column" should be the name of a single column from the set of selected and/or defined columns as of the end of all transformations
  • Output parameter direction must be one of "desc" or "asc"

ColumnSelect and RowSum Transformation:

  • Entries in the "columns" list should be the name of an existing column, with respect to any previously executed transformations.
  • After the ColumnSelect transformation, all columns not present in the "columns" list are lost.
  • After the RowSum transformation, all string-valued columns not present in the "columns" list are lost.

RowSelect Transformation:

  • "column" should be the name of an existing column, with respect to any previously executed transformations.
  • "operator" should be one of <, >, <=, >=, =, or !=.
  • "criteria" should be either a number or string, and not an expression.
  • The type of the criteria (number or string) should match the type of the corresponding column chosen.

ColumnDefine Transformation:

  • "column" should be a unique name for the new column being defined, and should not conflict with the name of any existing column, with respect to any previously executed transformations
  • "expression" should be a valid mathematical expression using only scalar values (strings or numbers) or the names of existing columns, with respect to any previously executed transformations.
  • "expression" may use the following numerical operators: +, -, *, /, ^
  • After the ColumndDefine transformation, a new column with the given name is added.

bryantrobbins avatar Jan 08 '17 04:01 bryantrobbins

Checking the column definition expressions is the hardest part of this. I'm using the pyparsing module (http://pyparsing.wikispaces.com/) to write a Python class with the necessary logic.

Check out https://github.com/bryantrobbins/baseball/blob/master/shared/btr3baseball/ExpressionValidator.py

bryantrobbins avatar Jan 14 '17 15:01 bryantrobbins

The Configuration Validator (top-level) is Here: https://github.com/bryantrobbins/baseball/blob/master/shared/btr3baseball/ConfigValidator.py

bryantrobbins avatar Feb 01 '17 04:02 bryantrobbins

TODO: Add a list here of possible exceptions thrown by the ConfigValidator for consumption by the UI and Worker

bryantrobbins avatar Feb 01 '17 05:02 bryantrobbins