Semantic checks
This is an umbrella issue for cases when it is possible to write some code where it isn't clear semantically what should it do. These cases should be dealt with by either clearly defining what they do, or disallowing them. The latter means adding a semantic check to the native API and/or parallelize that detects the problem, and prints a helpful error message that explains the situation to the user.
The first example is accessing a stateful with a .bag() call from inside the UDF of an updateWith* call on it. It is not clear whether the user
- wants to get the old state (from before the
updateWithManycall), or - wants to see the effect of (maybe some of the) other UDF invocations, within the same
updateWith*call.
This confusing situation should be avoided by disallowing this:
the native API could set some flag on the stateful for the duration of an updateWith* call, and check this flag in .bag() and throw an exception;
parallelize could detect this at compile time (eg. in ComprehensionAnalysis.comprehend, when creating an updateWith* combinator).
We can also check for methods / functions that take and / or return DataBags and issue warnings. We should also think about how to solve this problem in the future (e.g. inlining, macro annotations).