IncrementalInference.jl icon indicating copy to clipboard operation
IncrementalInference.jl copied to clipboard

Documenting CSM and inference steps for debugging

Open dehann opened this issue 4 years ago • 19 comments

EDIT, some of these code snippets are out of date.

Console

# save clique subgraphs during inference at critical steps
getSolverParams(fg).dbg = true
# record CSM steps for later analysis
tree, smt, hist = solveTree!(fg, recordcliqs=ls(fg))

# print to screen the CSM steps for clique with frontal :x2
printCliqHistorySummary(tree, :x2)

# where is everything saved???
getLogPath(fg)

clid = getCliq(tree, :x5).index
STEP = 8
fncAS = hist[clid][STEP][3]
dataAS = deepcopy(hist[clid][STEP][4])
fncAS(dataAS)
  • solveTree! will always generate content at getLogPath(fg) location.
  • More or less will be stored there depending getSolverParams(fg).dbg.
  • recordcliqs=[:x1;:x2;...] will save the CSM history to RAM.

Animate CSM Video

Can also make concurrent video of CSM at work:

# new interface
csmAnimate(tree, hist, frames=1000)

run(`ffmpeg -r 10 -i /tmp/caesar/csmCompound/csm_%d.png -c:v libtheora -vf fps=25 -pix_fmt yuv420p -vf "scale=trunc(iw/2)*2:trunc(ih/2)*2" -q 10 /tmp/caesar/csmCompound/out.ogv`)

@async run(`totem /tmp/caesar/csmCompound/out.ogv`)

Also see csmAnimate's own documentation.

EDIT

new Caesar.writevideo function directly engages with ffmpeg which is easier:

https://github.com/JuliaRobotics/Caesar.jl/blob/b514d18cbb6cc3b5557f67617ed08918c4260a26/src/images/images.jl#L6-L50


Also see example of debug process in #754

dehann avatar Nov 20 '19 22:11 dehann

Note that when using getSolverParams(fg).dbg=true each of the major stages of the CSM clique inference subgraphs are stored to the getLogPath(fg) destination -- including:

  • fg_build, as copied from main fg object
  • fg_beforeupsolve
  • fg_afterupsolve
  • fg_beforedownsolve
  • fg_afterdownsolve

which shows the belief messages passed up or down in each case added as MsgPriors to the clique subgraphs.

dehann avatar Nov 23 '19 21:11 dehann

Is there a way to debug with a failed solve (where it errors and doesn't return)?

Affie avatar Aug 20 '20 14:08 Affie

Hi @Affie , yeah so the best I have been able to do so far is as follows:

First thing is to find where the logs are going to land:

@show getLogPath(fg)

Then solve with all debug options:

getSolverParams(fg).dbg = true
getSolverParams(fg).async = true

tree, smt, hist = solveTree!(fg, verbose=true, recordcliqs=ls(fg));

Verbose is very useful, but only prints to the console so that you can get an idea of how far each CSM gets and in which functions the error might occur.

With this you will then get process 1 context back while the solver works in the background -- do not start a second solve while the first is still working in the background.

Once the failures occur, there are three things to do:

  • check the logpath for all details, the error stack will be printed out there in text files.
  • check smt on which CSM failed, and you should be able to see the error stack there as well (but not always),
  • for the CSM objects that have not failed but did complete, you can fetch the recordcliq history with hist4 = fetch(smt[4]).

One more thing you can do is to limit the number of iterations for all CSM:

getSolverParams(fg).limititers = 13

which allows you to force early termination of CSM and thereby get the csmc object nearest the failure. It is actually possible to get even closer, one CSM at a time:

fg = generateCanonicaFG_Hexagonal()
@show getLogPath(fg)

tree = wipeBuildNewTree!(fg)
drawTree(tree, show=true)

getSolverParams(fg).dbg = true
getSolverParams(fg).async = true

getSolverParams(fg).limititers = 13
cliqtask = solveCliq!(fg, tree, :x2, verbose=true, recordcliq=true, async=true)

# wait for that task to finish
sleep(10)

getSolverParams(fg).limititers = 9
cliqtask = solveCliq!(fg, tree, :x0, verbose=true, recordcliq=true, async=true)
...

dehann avatar Aug 20 '20 14:08 dehann

Actually there is one more you can do with hist::Dict (automatically fetched for all CSM if async=false), use printCliqHistorySequential to see output like this:

using RoME
fg = generateCanonicalFG_Hexagonal()
tree, smt, hist = solveTree!(fg, recordcliqs=ls(fg));
printCliqHistorySequential(hist)

semicolon is recommended :-)

julia> printCliqHistorySequential(hist)
1.1    10:27:13.868 testCliqCanRecycle 0 | null        P ----         C 2null 3null 4null |S| 
1.2    10:27:14.303 testCliqCanIncremt 0 | null        P ----         C 2null 3null 4null |S| 
1.3    10:27:14.31  isCliqUpSolved     0 | null        P ----         C 2null 3null 4null |S| 
1.4    10:27:14.362 buildCliqSubgraph  0 | null        P ----         C 2null 3null 4null |S| 
1.5    10:27:15.617 isCliqNull         0 | null        P ----         C 2null 3null 4null |S| 
1.6    10:27:15.908 doesCliqNeeddownms 0 | null        P ----         C 2null 3null 4null |S| 
1.7    10:27:15.917 determineCliqNeedD 0 | null        P ----         C 2null 3null 4null |S| 
1.8    10:27:15.955 slowCliqIfChildren 0 | null        P ----         C 2null 3null 4null |S| 
2.1    10:27:15.987 testCliqCanRecycle 0 | null        P 1null        C 5null 6null |S| 3null 4null 
2.2    10:27:15.991 testCliqCanIncremt 0 | null        P 1null        C 5null 6null |S| 3null 4null 
2.3    10:27:15.993 isCliqUpSolved     0 | null        P 1null        C 5null 6null |S| 3null 4null 
2.4    10:27:15.995 buildCliqSubgraph  0 | null        P 1null        C 5null 6null |S| 3null 4null 
2.5    10:27:15.999 isCliqNull         0 | null        P 1null        C 5null 6null |S| 3null 4null 
2.6    10:27:16.003 doesCliqNeeddownms 0 | null        P 1null        C 5null 6null |S| 3null 4null 
2.7    10:27:16.007 checkIfCliqNullBlo 0 | null        P 1null        C 5null 6null |S| 3null 4null 
2.8    10:27:16.086 blockCliqSiblingsP 0 | null        P 1null        C 5null 6null |S| 3null 4null 
2.9    10:27:16.151 determineCliqNeedD 0 | null        P 1null        C 5null 6null |S| 3null 4null 
2.10   10:27:16.155 slowCliqIfChildren 0 | null        P 1null        C 5null 6null |S| 3null 4null 
3.1    10:27:16.159 testCliqCanRecycle 0 | null        P 1null        C ---- |S| 2null 4null 
3.2    10:27:16.165 testCliqCanIncremt 0 | null        P 1null        C ---- |S| 2null 4null 
3.3    10:27:16.168 isCliqUpSolved     0 | null        P 1null        C ---- |S| 2null 4null 
3.4    10:27:16.17  buildCliqSubgraph  0 | null        P 1null        C ---- |S| 2null 4null 
3.5    10:27:16.337 isCliqNull         0 | null        P 1null        C ---- |S| 2null 4null 
3.6    10:27:16.342 doesCliqNeeddownms 0 | null        P 1null        C ---- |S| 2null 4null 
3.7    10:27:16.347 checkIfCliqNullBlo 0 | null        P 1null        C ---- |S| 2null 4null 
3.8    10:27:16.352 blockCliqSiblingsP 0 | null        P 1null        C ---- |S| 2null 4null 
3.9    10:27:16.358 determineCliqNeedD 0 | null        P 1null        C ---- |S| 2null 4null 
3.10   10:27:16.363 getBetterName7b    0 | null        P 1null        C ---- |S| 2null 4null 
4.1    10:27:16.39  testCliqCanRecycle 0 | null        P 1null        C ---- |S| 2null 3null 
4.2    10:27:16.392 testCliqCanIncremt 0 | null        P 1null        C ---- |S| 2null 3null 
4.3    10:27:16.394 isCliqUpSolved     0 | null        P 1null        C ---- |S| 2null 3null 
4.4    10:27:16.397 buildCliqSubgraph  0 | null        P 1null        C ---- |S| 2null 3null 
4.5    10:27:16.449 isCliqNull         0 | null        P 1null        C ---- |S| 2null 3null 
4.6    10:27:16.454 doesCliqNeeddownms 0 | null        P 1null        C ---- |S| 2null 3null 
4.7    10:27:16.459 checkIfCliqNullBlo 0 | null        P 1null        C ---- |S| 2null 3null 
4.8    10:27:16.465 blockCliqSiblingsP 0 | null        P 1null        C ---- |S| 2null 3null 
4.9    10:27:16.47  determineCliqNeedD 0 | null        P 1null        C ---- |S| 2null 3null 
4.10   10:27:16.476 getBetterName7b    0 | null        P 1null        C ---- |S| 2null 3null 
5.1    10:27:16.481 testCliqCanRecycle 0 | null        P 2null        C ---- |S| 6null 
5.2    10:27:16.483 testCliqCanIncremt 0 | null        P 2null        C ---- |S| 6null 
5.3    10:27:16.486 isCliqUpSolved     0 | null        P 2null        C ---- |S| 6null 
5.4    10:27:16.488 buildCliqSubgraph  0 | null        P 2null        C ---- |S| 6null 
5.5    10:27:16.493 isCliqNull         0 | null        P 2null        C ---- |S| 6null 
5.6    10:27:16.499 doesCliqNeeddownms 0 | null        P 2null        C ---- |S| 6null 
5.7    10:27:16.504 checkIfCliqNullBlo 0 | null        P 2null        C ---- |S| 6null 
5.8    10:27:16.51  blockCliqSiblingsP 0 | null        P 2null        C ---- |S| 6null 
5.9    10:27:16.515 determineCliqNeedD 0 | null        P 2null        C ---- |S| 6null 
5.10   10:27:16.52  getBetterName7b    0 | null        P 2null        C ---- |S| 6null 
6.1    10:27:16.525 testCliqCanRecycle 0 | null        P 2null        C ---- |S| 5null 
6.2    10:27:16.528 testCliqCanIncremt 0 | null        P 2null        C ---- |S| 5null 
6.3    10:27:16.53  isCliqUpSolved     0 | null        P 2null        C ---- |S| 5null 
6.4    10:27:16.532 buildCliqSubgraph  0 | null        P 2null        C ---- |S| 5null 
6.5    10:27:16.537 isCliqNull         0 | null        P 2null        C ---- |S| 5null 
6.6    10:27:16.543 doesCliqNeeddownms 0 | null        P 2null        C ---- |S| 5null 
6.7    10:27:16.548 checkIfCliqNullBlo 0 | null        P 2null        C ---- |S| 5null 
6.8    10:27:16.553 blockCliqSiblingsP 0 | null        P 2null        C ---- |S| 5null 
6.9    10:27:16.559 determineCliqNeedD 0 | null        P 2null        C ---- |S| 5null 
6.10   10:27:16.564 getBetterName7b    0 | null        P 2null        C ---- |S| 5null 
3.11   10:27:16.57  attemptCliqInitUp  0 | null        P 1null        C ---- |S| 2null 4null 
3.12   10:27:16.598 mustInitUpCliq     0 | null        P 1null        C ---- |S| 2null 4null 
3.13   10:27:19.439 doCliqUpSolveIniti 0 | null        P 1null        C ---- |S| 2null 4null 
4.11   10:27:23.505 attemptCliqInitUp  0 | null        P 1null        C ---- |S| 2null 3null 
4.12   10:27:23.511 mustInitUpCliq     0 | null        P 1null        C ---- |S| 2null 3null 
4.13   10:27:23.516 doCliqUpSolveIniti 0 | null        P 1null        C ---- |S| 2null 3null 
5.11   10:27:29.764 attemptCliqInitUp  0 | null        P 2null        C ---- |S| 6null 
5.12   10:27:29.769 mustInitUpCliq     0 | null        P 2null        C ---- |S| 6null 
5.13   10:27:29.774 doCliqUpSolveIniti 0 | null        P 2null        C ---- |S| 6null 
6.11   10:27:35.021 attemptCliqInitUp  0 | null        P 2null        C ---- |S| 5null 
6.12   10:27:35.027 mustInitUpCliq     0 | null        P 2null        C ---- |S| 5null 
6.13   10:27:35.032 doCliqUpSolveIniti 0 | null        P 2null        C ---- |S| 5null 
3.14   10:27:40.464 rmUpLikeliSaveSubF 0 | upsolved    P 1null        C ---- |S| 2null 4null 
3.15   10:27:40.523 finishCliqSolveChe 0 | upsolved    P 1null        C ---- |S| 2null 4null 
4.14   10:27:40.841 rmUpLikeliSaveSubF 0 | upsolved    P 1null        C ---- |S| 2null 3upsolved 
4.15   10:27:40.846 finishCliqSolveChe 0 | upsolved    P 1null        C ---- |S| 2null 3upsolved 
5.14   10:27:40.887 rmUpLikeliSaveSubF 0 | upsolved    P 2null        C ---- |S| 6null 
5.15   10:27:40.892 finishCliqSolveChe 0 | upsolved    P 2null        C ---- |S| 6null 
6.14   10:27:40.899 rmUpLikeliSaveSubF 0 | upsolved    P 2null        C ---- |S| 5upsolved 
6.15   10:27:40.904 finishCliqSolveChe 0 | upsolved    P 2null        C ---- |S| 5upsolved 
3.16   10:27:40.955 determineCliqIfDow 0 | upsolved    P 1null        C ---- |S| 2null 4upsolved 
2.11   10:27:41.187 isCliqNull         0 | null        P 1null        C 5upsolved 6upsolved |S| 3upsolved 4upsolved 
2.12   10:27:41.191 doesCliqNeeddownms 0 | null        P 1null        C 5upsolved 6upsolved |S| 3upsolved 4upsolved 
2.13   10:27:41.194 checkIfCliqNullBlo 0 | null        P 1null        C 5upsolved 6upsolved |S| 3upsolved 4upsolved 
2.14   10:27:41.198 blockCliqSiblingsP 0 | null        P 1null        C 5upsolved 6upsolved |S| 3upsolved 4upsolved 
2.15   10:27:41.202 determineCliqNeedD 0 | null        P 1null        C 5upsolved 6upsolved |S| 3upsolved 4upsolved 
2.16   10:27:41.206 getBetterName7b    0 | null        P 1null        C 5upsolved 6upsolved |S| 3upsolved 4upsolved 
4.16   10:27:41.21  determineCliqIfDow 0 | upsolved    P 1null        C ---- |S| 2null 3upsolved 
5.16   10:27:41.215 determineCliqIfDow 0 | upsolved    P 2null        C ---- |S| 6upsolved 
6.16   10:27:41.221 determineCliqIfDow 0 | upsolved    P 2null        C ---- |S| 5upsolved 
2.17   10:27:41.311 attemptCliqInitUp  0 | null        P 1null        C 5upsolved 6upsolved |S| 3upsolved 4upsolved 
2.18   10:27:41.327 mustInitUpCliq     0 | null        P 1null        C 5upsolved 6upsolved |S| 3upsolved 4upsolved 
2.19   10:27:42.015 doCliqUpSolveIniti 0 | null        P 1null        C 5upsolved 6upsolved |S| 3upsolved 4upsolved 
2.20   10:27:42.99  rmUpLikeliSaveSubF 0 | upsolved    P 1null        C 5upsolved 6upsolved |S| 3upsolved 4upsolved 
2.21   10:27:43.061 finishCliqSolveChe 0 | upsolved    P 1null        C 5upsolved 6upsolved |S| 3upsolved 4upsolved 
1.9    10:27:43.065 isCliqNull         0 | null        P ----         C 2upsolved 3upsolved 4upsolved |S| 
1.10   10:27:43.069 doesCliqNeeddownms 0 | null        P ----         C 2upsolved 3upsolved 4upsolved |S| 
1.11   10:27:43.074 determineCliqNeedD 0 | null        P ----         C 2upsolved 3upsolved 4upsolved |S| 
1.12   10:27:43.086 getBetterName7b    0 | null        P ----         C 2upsolved 3upsolved 4upsolved |S| 
2.22   10:27:43.091 determineCliqIfDow 0 | upsolved    P 1null        C 5upsolved 6upsolved |S| 3upsolved 4upsolved 
1.13   10:27:43.193 attemptCliqInitUp  0 | null        P ----         C 2upsolved 3upsolved 4upsolved |S| 
1.14   10:27:43.201 mustInitUpCliq     0 | null        P ----         C 2upsolved 3upsolved 4upsolved |S| 
1.15   10:27:43.211 doCliqUpSolveIniti 0 | null        P ----         C 2upsolved 3upsolved 4upsolved |S| 
1.16   10:27:44.979 rmUpLikeliSaveSubF 0 | upsolved    P ----         C 2upsolved 3upsolved 4upsolved |S| 
1.17   10:27:44.983 finishCliqSolveChe 0 | upsolved    P ----         C 2upsolved 3upsolved 4upsolved |S| 
1.18   10:27:44.999 determineCliqIfDow 0 | upsolved    P ----         C 2upsolved 3upsolved 4upsolved |S| 
3.17   10:27:45.195 doCliqDownSolve    0 | upsolved    P 1downsolved  C ---- |S| 2upsolved 4upsolved 
4.17   10:27:46.427 doCliqDownSolve    0 | upsolved    P 1downsolved  C ---- |S| 2upsolved 3downsolved 
2.23   10:27:47.372 doCliqDownSolve    0 | upsolved    P 1downsolved  C 5upsolved 6upsolved |S| 3downsolved 4downsolved 
1.19   10:27:48.323 exitStateMachine   0 | downsolved  P ----         C 2downsolved 3downsolved 4downsolved |S| 
3.18   10:27:48.35  exitStateMachine   0 | downsolved  P 1downsolved  C ---- |S| 2downsolved 4downsolved 
5.17   10:27:48.355 doCliqDownSolve    0 | upsolved    P 2downsolved  C ---- |S| 6upsolved 
6.17   10:27:49.298 doCliqDownSolve    0 | upsolved    P 2downsolved  C ---- |S| 5downsolved 
4.18   10:27:50.247 exitStateMachine   0 | downsolved  P 1downsolved  C ---- |S| 2downsolved 3downsolved 
2.24   10:27:50.251 exitStateMachine   0 | downsolved  P 1downsolved  C 5downsolved 6downsolved |S| 3downsolved 4downsolved 
5.18   10:27:50.257 exitStateMachine   0 | downsolved  P 2downsolved  C ---- |S| 6downsolved 
6.18   10:27:50.348 exitStateMachine   0 | downsolved  P 2downsolved  C ---- |S| 5downsolved 

dehann avatar Aug 20 '20 14:08 dehann

The most pressing developer tool here is to better manage how CSM terminates on error. The two options I have come up with are either (sure there are more options available):

  • Propagate the error all the way (less desirable)
  • Build a dedicated error state in CSM which records and closes out a CSM on error and then notify other CSMs to stop as well.

see #758

dehann avatar Aug 20 '20 14:08 dehann

reminder that the revamped getSolverParams(fg).useMsgLikelihoods = true is default false at this time and will be made default true in future.

dehann avatar Aug 20 '20 14:08 dehann

Another trick that doesn't come up too often but none the less: getSolverParams(fg).multiproc = false -- is default true. Similarly, you can skip up or down stages by getSolverParams(fg).dodownsolve = false. All the parameter options are:

getSolverParams(fg) |> typeof |> fieldnames

dehann avatar Aug 20 '20 14:08 dehann

Also worth stating again here, the purpose of recordcliqs is to allow repeat steps:

# either from hist4 = hist[4] or
hist4 = fetch(cliqtask4)

# the next function that would have been called at step 11
fnc11! = hist4[11][3]
csmc11_ = deepcopy(hist4[11][4])

# must reset the condition
getSolveCondition(csmc11_.cliq) = Condition()

# now redo that step (likely with debugger)
fnc12_ = fnc11!(csmc11_)

# note the csmc11_ object will now be changed after fnc11!
## The design intends
fnc13_ = fnc12_(csmc11_)

# ...

dehann avatar Aug 20 '20 14:08 dehann

added a new tool limittercliqs::Vector{Pair{Symbol, Int}} that allows early stopping of any specific CSMs per user desire, making it much easier to debug how CSMs work: https://github.com/JuliaRobotics/IncrementalInference.jl/blob/1ad9b34579d9bb6fbbe8ea317ba8d4ff66b241e7/test/testBasicTreeInit.jl#L8

dehann avatar Aug 29 '20 22:08 dehann

Snapshot example of hex init script: https://github.com/JuliaRobotics/Caesar.jl/blob/a486c0728139e485ded88fa7f725295add002a7a/examples/dev/devCascadeInitCSM.jl#L3

dehann avatar Sep 10 '20 14:09 dehann

Commits 426e98f , 1dbe967 , 030d75b add a new swim lane type CSM printout:

julia> IIF.printCSMHistoryLogical(hists)
    |     x0             |     x4             |     l1             |     x1             |     x5             |     x3             
----+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------
1   | 1   testCliqC null | 10  testCliqC null | 21  testCliqC null | 32  testCliqC null | 43  testCliqC null | 54  testCliqC null 
2   | 2   testCliqC null | 11  testCliqC null | 22  testCliqC null | 33  testCliqC null | 44  testCliqC null | 55  testCliqC null 
3   | 3   isCliqUpS null | 12  isCliqUpS null | 23  isCliqUpS null | 34  isCliqUpS null | 45  isCliqUpS null | 56  isCliqUpS null 
4   | 4   buildCliq null | 13  buildCliq null | 24  buildCliq null | 35  buildCliq null | 46  buildCliq null | 57  buildCliq null 
5   | 5   canCliqMa null | 14  canCliqMa null | 25  canCliqMa null | 36  canCliqMa null | 47  canCliqMa null | 58  canCliqMa null 
6   | 6   blockUnti null | 15  blockUnti null | 26  blockUnti null | 37  blockUnti null | 48  blockUnti null | 59  blockUnti null 
7   | 7   trafficRe null | 16  trafficRe null | 27  trafficRe null | 38  trafficRe null | 49  trafficRe null | 60  trafficRe null 
8   | 8   determine null | 17  checkIfCl null | 28  checkIfCl null | 39  checkIfCl null | 50  checkIfCl null | 61  checkIfCl null 
9   | 9   childCliq null | 18  blockCliq null | 29  blockCliq null | 40  blockCliq null | 51  blockCliq null | 62  blockCliq null 
10  | 103 trafficRe null | 19  determine null | 30  determine null | 41  determine null | 52  determine null | 63  determine null 
11  | 104 determine null | 20  childCliq null | 31  towardUpO null | 42  towardUpO null | 53  towardUpO null | 64  towardUpO null 
12  | 105 towardUpO null | 85  trafficRe null | 65  attemptCl null | 68  attemptCl null | 71  attemptCl null | 74  attemptCl null 
...

dehann avatar Sep 11 '20 17:09 dehann

as of FSM v0.2.8 a new injectDelayBefore as well as a watchdog timeout features have been added to help with the debugging process. Note that watchdog timeout is not really meant for mainstream use in CSM just yet, more to help development when CSMs don't terminate by themselves.

dehann avatar Sep 13 '20 03:09 dehann

Just copying a recent example here for the record:

using Caesar, RoME


fg = generateCanonicalFG_Hexagonal(graphinit=false)

getSolverParams(fg).treeinit = true
getSolverParams(fg).graphinit = false
getSolverParams(fg).limititers = 100

getSolverParams(fg).drawtree = true
getSolverParams(fg).showtree = true
getSolverParams(fg).dbg = true
getSolverParams(fg).async = true


# limitcliqs = [:x0=>8;:x4=>12;:l1=>21;:x1=>21;:x5=>50;:x3=>50] # breaks
# limitcliqs = [:x0=>8;:x4=>13;:l1=>21;:x1=>21;:x5=>60;:x3=>60] # 50 # doesnt break, blocks

# injectDelayBefore=[2=>(canCliqMargRecycle_StateMachine=>5); ] # step 8
# injectDelayBefore=[5=>(canCliqMargRecycle_StateMachine=>5); ]
# injectDelayBefore=[6=>(towardUpOrDwnSolve_StateMachine=>10); ]
# injectDelayBefore = nothing


mkpath(getLogPath(fg))
verbosefid = open(joinLogPath(fg, "csmVerbose.log"),"w")
# verbosefid = stdout
tree, smt, hists = solveTree!(fg, recordcliqs=ls(fg), verbose=true, verbosefid=verbosefid, timeout=50 ) #, timeout=40 , injectDelayBefore=injectDelayBefore ) #, limititercliqs=limitcliqs);

flush(verbosefid)
close(verbosefid)
# for .async = true (because .drawTree=true)
smt[7] |> x->schedule(x, InterruptException(), error=true)

open(joinLogPath(fg, "csmLogicalReconstructMax.log"),"w") do io
  IIF.reconstructCSMHistoryLogical(getLogPath(fg), fid=io)
end

# async case
fetchCliqHistoryAll!(smt, hists)

open(joinLogPath(fg, "csmSequ.log"),"w") do fid
  printCliqHistorySequential(hists, nothing, fid)
end

open(joinLogPath(fg, "csmLogi.log"),"w") do fid
  printCSMHistoryLogical(hists, fid)
end

# printCliqHistorySequential(hists)
# printCliqHistorySequential(hists, 1=>10)
# printCliqHistorySequential(hists, [1,4,6]=>11:15)
# printCliqHistorySequential(hists, [1=>9:16; 2=>20:34; 4=>29:34])
# printCliqHistorySequential(hists, [5=>12:21;6=>12:21])

printCSMHistoryLogical(hists)

# also see dbg logs at this path for more info
# @show getLogPath(fg)

using Images
csmAnimateSideBySide(tree, hists, encode=true, nvenc=true, show=true)

dehann avatar Sep 18 '20 20:09 dehann

Enable clique history recording and provide smtasks, then retrieve with fetchCliqHistoryAll! even if an exception is thrown.

smtasks = Task[]
solveTree!(fg; smtasks, verbose=true, timeout=10, recordcliqs=ls(fg));
hists = fetchCliqHistoryAll!(smtasks);

We can automate it further in the future.

Affie avatar Sep 28 '20 07:09 Affie

Will the internal fetchCliqHistoryAll! work in general or should we definitely call it after a solve? See it is commented out: https://github.com/JuliaRobotics/IncrementalInference.jl/blob/8bede364027b62eb1e3dba1a5b2285de583c6050/src/SolverAPI.jl#L533

dehann avatar Sep 29 '20 02:09 dehann

Will the internal fetchCliqHistoryAll! work in general or should we definitely call it after a solve?

For now, it has to be called after the solve with smtasks passed in by reference. I was thinking to still return it if dbg is on. So it would not throw an error but @error and return the normal tree, smtasks, hists

I don't know about the commented out call.

Affie avatar Sep 29 '20 07:09 Affie

Everything logged with logCSM as debug level can be displayed on the console by enabling debug printing on the clique, for example, enable CSM 2 with ENV["JULIA_DEBUG"] = :csm_2

Affie avatar Oct 27 '20 12:10 Affie

Another usefull workflow for debugging CSM freezes is:

getSolverParams(fg).async = true
smtasks = Task[]
solveTree!(fg; smtasks)

Then if it hangs call:

IIF.throwIntExcToAllTasks(smtasks)

Affie avatar Feb 25 '21 13:02 Affie

getSolverParams(fg).dbg=true also saves the csmc and statemachine in task storage. it is saved in: smtasks[i].storage[:csmc] smtasks[i].storage[:statemachine]

Affie avatar Feb 25 '21 16:02 Affie