Support tensorflow debugger
To debug my model, I thought I could connect my program to tensorboard to decipher the cryptic msg:
TensorFlowException TF_INVALID_ARGUMENT "In[0] is not a matrix\n\t [[Node: MatMul_70 = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device=\"/job:localhost/replica:0/task:0/device:CPU:0\"](Const_41, Mean_69)]]"
I could not find the equivalent to the python function:
tf_debug.TensorBoardDebugWrapperSession("machine:7000")
is it implemented ? If not, is it in the pipeline ? Fabien
There isn't any support for the tensorflow debugger right now. I'm not sure what work is required to support it.
A short-term workaround might be to use asGraphDef to get the graph as a proto, then write it to a file and load it into tensorboard so that you can more easily inspect the graph to figure out what part of your code that MatMul is coming from.
For the cryptic error messages: We should prioritize https://github.com/tensorflow/haskell/issues/24 so that these look like nice compiler errors that point to the line of code causing an issue.
Actually, instead of asGraphDef, you can use logGraph to write to a tensorboard log file directly:
https://tensorflow.github.io/haskell/haddock/tensorflow-logging-0.2.0.0/TensorFlow-Logging.html#v:logGraph
Just make sure to do that before you try to build the graph, otherwise you'll get the tensorflow runtime exception first.
logGraph allows to start tensorboard. Unfortunately, the graph loading process hangs at about 30% with the message: Data: Parsing graph.pbtxt
I made a little progress but I don't understand the following message: TensorFlowException TF_INVALID_ARGUMENT "Incompatible shapes: [784,500] vs. [500,784]\n\t [[Node: Mul_43 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Relu_40, Transpose_42)]]"
Does it mean that the dimensions in the node Mul_43 are incorrect ? Thanks for the effort anyway.
Hmm. You may need to make sure the withEventWriter call exits before the error happens, otherwise it may not have flushed the file write yet and so the graph.pbtxt will be incomplete.
TF.withEventWriter "/path/to/logs" $ \eventWriter -> TF.logGraph eventWriter graph
-- Other code that actually runs the graph.
Does it mean that the dimensions in the node Mul_43 are incorrect ?
That does seem to be what it is saying, but the dimension look compatible to me... If you have any code you can share I can take a look.
code.tar.gz I tried to remove all the un-necessary code from the file. The cabal project is built in a sandbox. The error is: TensorFlowException TF_INVALID_ARGUMENT "Incompatible shapes: [500,784] vs. [784,500]\n\t [[Node: Mul_7 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_inputToto/XTiti_0_0, ReadVariableOp_6)]]"
I had to make a few edits to get the code to compile, e.g. I got this error
.../src/RBM.hs:117:45: error:
Variable not in scope: h0 :: TFT.Tensor v0 t0
|
117 | TFL.scalarSummary (pack "update_w") h0 -- update_w
| ^^
After renaming h_sampleProbArg to h0 and adding a Main module, I was able to build. I couldn't reproduce the error though, it ran fine for me.
I switched to the python version of the code as it runs flawlessly. Thanks for your support anyway