functionless icon indicating copy to clipboard operation
functionless copied to clipboard

Step Functions: Optimization - remove arbitrary variable assignment.

Open thantos opened this issue 3 years ago • 0 comments

Standard Step Functions price model is based on transitions. In general, to assign a variable in SFN costs at least one state transition to generate the state to do so.

For both Standard and Express machines, the memory space is limited to 256kb. Variable assignment duplicate the memory footprint of whatever they contain and step functions does not have a concept of reference variables.

Both of these situations mean limiting variable creation is preferred.

In some common cases, arbitrary assignment are generated by the transpiler or the user.

Functionless's ASL and ASLGraph transpiler generates theoretically unnecessary variable assignments.

Lets look at an example case.

const x = await task();
const { payload, result } = await task();

The transpiler uses greedy evaluation in order to simplify transpilation logic.

The variable stmt/decl will be evaluated starting with the variable name on the left and then the integration (task) on the right. The task will generate some states, which are up to the author of the integration. For simplicity, the integration just needs to output some value as JsonPath, LiteralValue, or a condition statement (simplified to a boolean json path).

Lets say the integration just generates a task, like invoking a lambda.

graph LR;
subgraph task
lambda --> |return|tempHeap
end
tempHeap --> |assign|`x`

Shown above, the task can encapsulate the logic to generate the variable.

But we can see that this logic may be able to simplified to

graph LR;
subgraph task
lambda --> |return|`x`
end

In a more complex case, lets look at const { payload, result } = await task();

graph LR;
subgraph task
lambda --> |return|tempHeap
end
tempHeap --> |`tempHeap.payload=`|`payload`
`payload` --> |`tempHeap.result=`|`result`

In this case, there isn't much value in simplification, the result must be bound to multiple variables anywhere and the temp heap is the vehicle to do so.

Users also create unnecessary assignments

[1,2,3].map((x) => {
   const item = x;
   return item;
});
graph LR;
temp["tempArray = [1,2,3]"]
check["isPresent(tempArray[0])"]
tail["tempArray = tempArray[1:]"]
assign["x = tempArray[0]"]
assignItem["item = x"]
return["tempArray[0]"]
result["resultArray=tempArray[0]"]
temp --> check
check --> assign
check --> |not present|exit
assign --> assignItem
assignItem --> |`return item`|return
return --> result
result --> tail
tail --> check

Here we can see that at least the assignment of x to item is not needed and maybe even the assignment of the tempArray[0] to it self can be seen as having no effect.

Algorithm

Find variables which are eventually assigned to other variables without mutation. Replace said variable assignment with the property they are eventually assigned to.

var =1> heap =2> prop

var => prop if var cannot be assigned to between 1 and 2 if heap is only assigned in 1

var =1> heap =2> prop var =1> heap =3> prop2

var => prop if var cannot be assigned to between 1 and 2 if heap is only assigned to in 1 var => prop2 if var cannot be assigned to between 1 and 3 if heap is only assigned to in 1 canBeAssigned(node, start, end): all nodes between between N(start) and N(end) do not have a ResultPath starting with node

Note: a more naive approach would only simplify variables that are used once, instead of considering the range in which they can be mutated.

Assignment

ASL Fields:

  • ResultPath - Pass, Task, Map
  • Parameters - nested field and property assignment

Use

  • InputPath (ex: { InputPath: $.v1, ResultPath: $.v2, Type: "Pass" } assigns from $.v1 to $.v2.
  • Parameters - field names with the suffix .$ can contain variables.
  • Intrinsic functions, States.Format, Array, StringToJson, JsonToString can contain variables. Must be in Parameters
  • ItemsPath - Map state only
  • Conditions - Choices - references variables
  • Return - the special return case can assign all variables to the top state in order to return from the machine { Type: "Pass", ResultPath: "$", End: true }

Edge Cases

  • If the target variable is used between the starting assignment and it's eventual assignment, we cannot simplify as the meaning will change. X =1> temp =3> Y can be simplified to X =1> Y unless Y is used between 1 and 3. X =1> temp =2> (if(Y)) =3> Y

thantos avatar Aug 12 '22 13:08 thantos