differential-datalog
differential-datalog copied to clipboard
RFC: DDlog debugger
Some initial thoughts
- How will the debugging tool work? (i.e., high-level workflow?) The complete functionality will go something like this:
- Compile the DDlog program with debugging hooks enabled, e.g., by providing
-g
CLI switch to DDlog. This will cause DDlog to inject the additional Inspect operators. - The compiled program can run without a debugger, in which case it behaves exactly as normal DDlog program, except being slightly slower (but hopefully still fast enough to even be used in production).
- It can also run with debugger enabled. There are several options here. We
may want to support one or more of them:
- The debugger runs as a standalone process. The injected inspect operators send information about changes to this process via some form of IPC. The DDlog program can connect to the debugger either on startup or during runtime, but in the latter case the debugger will only observe new derivations.
- The debugger runs in the same process as DDlog.
- Postmortem debugging: Debugging information is simply dumped into a file and later analyzed by the debugger.
- Does source to source transformation mean we generate another .dl file from a source .dl file with Inspect operator injected? 3 Or is it more like an implicit transformation (i.e., in ddlog compiler we will have logic to inject the inspect DD operator into the rust program). The generated rust program will have inspect operators inserted that our debugging tool will use.
There will be a function with a signature similar to:
injectDebuggingHooks :: DatalogProgram -> DatalogProgram
The program it outputs will contain the injected inspect operators and will
be passed to Compile.hs to generate the Rust code. It can also be
pretty-printed into a .dl
file for testing purposes (so we can manually
check that correct debugging hooks were injected).
- What will the inspect operator contain (i.e., what is the expression?)
Something like
Inspect dbg_event(ddlog_timestamp, ddlog_weight, 12345, (x,y,z))
where 12345
identifies the location in the program where the Inspect operator
was injected, and (x,y,z)
is a tuple containing all variables needed to replay
the rule activation.
The main question is exactly where to inject Inspect's and what variables
to send to the debugger. Consider this rule:
R0(a, b, c, d) :-
R1(a, b, z, _),
R2(c, q, z),
R3(d, q).
We could instrument it like this:
R0(a, b, c, d) :-
r1 in R1(a, b, z, _),
r2 in R2(c, q, z),
Inspect dbg_event(..., (r1, r2, (a, b, c, q))),
r3 in R3(d, q),
Inspect dbg_event(..., ((a, b, c, q), r3, R0{a, b, c, d})).
The first thing I did here is I added r1
, r2
, r3
variables that
bind to the complete record from each relation in the rule, not just
individual fields (a
, b
, ...). This is needed so that the debugger
can identify the exact record that triggered the derivation and can
trace its origin all the way back to input facts.
The set of variables passed to each dbg_event
call are exactly the
inputs to the join operator preceeding the call. For example, the first
join in the above rule takes complete records from R1
and R2
. The last
value passed to inspect ((a,b,c,q)
) is the record output by the operator.
By sending these values to the debugger we give it enough info to reverse
engineer the operator activation.
Consider the second join above. It takes the tuple of variables (a, b, c, q)
output by the previous join and the record from R3
and outputs a R0
record
R0{a,b,c,d}
.
The goal is to send enough info to the debugger, so it can trace fact derivations
without fully understanding the semantics of DDlog. All the debugger sees is
events of the form "Operator X derived fact F3 with weight W3 from facts F1 and
F2 at time T".
Aggregations are trickier, as inputs to the aggregate operator are normally
not available after the operator has been evaluated. E.g., if we aggregate using
group_max
, then only the max value is observable after the aggregation. One
solution is to automatically modify all Aggregate operators to also output its entire
input, e.g., we can rewrite the following rule:
R0(a, c) :-
R1(a, b),
var c = Aggregate((a), group_max(b)).
as
R0(a, c) :-
r1 in R1(a, b),
(var inputs, var c) = Aggregate((a), __dbg_group_max(r1, b)),
Inspect dbg_event(..., (inputs, R0{a, c})).
// Auto-generated aggregation function that uses the original
// aggregation function to compute the aggregate, but also
// returns the set of all inputs.
function __dbg_group_max(g: Group<'K, ('I, 'V)>): ('I, 'V) {
(var original_group, var inputs) = dbg_split_group(g);
(inputs, group_max(original_group))
}
- Maybe related to the above questions, There will be a ddlog program (this will be compiled and generated down into DD). So let say we have this new ddlog program with Inspect operators inserted in the rules. Running this program will be like running any other ddlog program (i.e.,we can feed it record dump we collected) But how will the debugging tool interact with this running program? I assume this debugging tool will be a new/separate rust program?
Yes, I think so. Its core functionality is to keep track of fact derivations and allow the user to trace output deltas back to input deltas. We need to think about the exact data structures it should maintain to make this possible. We probably want to start with a CLI debugger, but eventually we will want a GUI as well.