differential-datalog RFC: DDlog debugger

RFC: DDlog debugger

Open ryzhyk opened this issue 4 years ago • 0 comments

Some initial thoughts

How will the debugging tool work? (i.e., high-level workflow?) The complete functionality will go something like this:

Compile the DDlog program with debugging hooks enabled, e.g., by providing -g CLI switch to DDlog. This will cause DDlog to inject the additional Inspect operators.
The compiled program can run without a debugger, in which case it behaves exactly as normal DDlog program, except being slightly slower (but hopefully still fast enough to even be used in production).
It can also run with debugger enabled. There are several options here. We may want to support one or more of them:
- The debugger runs as a standalone process. The injected inspect operators send information about changes to this process via some form of IPC. The DDlog program can connect to the debugger either on startup or during runtime, but in the latter case the debugger will only observe new derivations.
- The debugger runs in the same process as DDlog.
- Postmortem debugging: Debugging information is simply dumped into a file and later analyzed by the debugger.

Does source to source transformation mean we generate another .dl file from a source .dl file with Inspect operator injected? 3 Or is it more like an implicit transformation (i.e., in ddlog compiler we will have logic to inject the inspect DD operator into the rust program). The generated rust program will have inspect operators inserted that our debugging tool will use.

There will be a function with a signature similar to: injectDebuggingHooks :: DatalogProgram -> DatalogProgram The program it outputs will contain the injected inspect operators and will be passed to Compile.hs to generate the Rust code. It can also be pretty-printed into a .dl file for testing purposes (so we can manually check that correct debugging hooks were injected).

What will the inspect operator contain (i.e., what is the expression?)

Something like

Inspect dbg_event(ddlog_timestamp, ddlog_weight, 12345, (x,y,z))

where 12345 identifies the location in the program where the Inspect operator was injected, and (x,y,z) is a tuple containing all variables needed to replay the rule activation. The main question is exactly where to inject Inspect's and what variables to send to the debugger. Consider this rule:

R0(a, b, c, d) :-
  R1(a, b, z, _),
  R2(c, q, z),
  R3(d, q).

We could instrument it like this:

R0(a, b, c, d) :-
  r1 in R1(a, b, z, _),
  r2 in R2(c, q, z),
  Inspect  dbg_event(..., (r1, r2, (a, b, c, q))),
  r3 in R3(d, q),
  Inspect  dbg_event(..., ((a, b, c, q), r3, R0{a, b, c, d})).

The first thing I did here is I added r1, r2, r3 variables that bind to the complete record from each relation in the rule, not just individual fields (a, b, ...). This is needed so that the debugger can identify the exact record that triggered the derivation and can trace its origin all the way back to input facts. The set of variables passed to each dbg_event call are exactly the inputs to the join operator preceeding the call. For example, the first join in the above rule takes complete records from R1 and R2. The last value passed to inspect ((a,b,c,q)) is the record output by the operator. By sending these values to the debugger we give it enough info to reverse engineer the operator activation. Consider the second join above. It takes the tuple of variables (a, b, c, q) output by the previous join and the record from R3 and outputs a R0 record R0{a,b,c,d}. The goal is to send enough info to the debugger, so it can trace fact derivations without fully understanding the semantics of DDlog. All the debugger sees is events of the form "Operator X derived fact F3 with weight W3 from facts F1 and F2 at time T". Aggregations are trickier, as inputs to the aggregate operator are normally not available after the operator has been evaluated. E.g., if we aggregate using group_max, then only the max value is observable after the aggregation. One solution is to automatically modify all Aggregate operators to also output its entire input, e.g., we can rewrite the following rule:

R0(a, c) :-
  R1(a, b),
  var c = Aggregate((a), group_max(b)).

R0(a, c) :-
  r1 in R1(a, b),
  (var inputs, var c) = Aggregate((a), __dbg_group_max(r1, b)),
  Inspect dbg_event(..., (inputs, R0{a, c})).
// Auto-generated aggregation function that uses the original
// aggregation function to compute the aggregate, but also
// returns the set of all inputs.
function __dbg_group_max(g: Group<'K, ('I, 'V)>): ('I, 'V) {
    (var original_group, var inputs) = dbg_split_group(g);
    (inputs, group_max(original_group))
}

Maybe related to the above questions, There will be a ddlog program (this will be compiled and generated down into DD). So let say we have this new ddlog program with Inspect operators inserted in the rules. Running this program will be like running any other ddlog program (i.e.,we can feed it record dump we collected) But how will the debugging tool interact with this running program? I assume this debugging tool will be a new/separate rust program?

Yes, I think so. Its core functionality is to keep track of fact derivations and allow the user to trace output deltas back to input deltas. We need to think about the exact data structures it should maintain to make this possible. We probably want to start with a CLI debugger, but eventually we will want a GUI as well.

May 27 '20 21:05 ryzhyk

differential-datalog differential-datalog copied to clipboard

RFC: DDlog debugger

differential-datalog
differential-datalog copied to clipboard