graph_def_editor icon indicating copy to clipboard operation
graph_def_editor copied to clipboard

ValueError("Operation {} does not belong to given graph".format(op)) when running get walk ops functions

Open NicholasMcElroy opened this issue 3 years ago • 5 comments

Hello, I'm currently using your library to do some operations on the graph of a model in TensorFlow 2, and I'm having some issues with figuring out the proper way to convert a tensor to either a gde.Node or gde.Tensor object to use in the library's functions. I'm converting my tensors as follows: cLfqsCA 1 gra is the name of my gde.Graph object, for reference. After converting the tensors this way, when I run get backward walk ops on my ys_g I get a placeholder operation, and when I run get forward walk ops on the xs_g I get ValueError("Operation {} does not belong to given graph".format(op)) as an error. Looking at the code in the util file I see that this is returned after checking that the op has a value for its graph attribute, so I'm guessing this is what's causing issues with my code. How can I make sure that this attribute gets a value when converting? Any help is appreciated, thank you!

NicholasMcElroy avatar Jun 23 '21 21:06 NicholasMcElroy

Thanks for reaching out @NicholasMcElroy! Might you have a self-contained piece of Python code that reproduces the problem you are seeing?

frreiss avatar Jun 25 '21 16:06 frreiss

It's a bit complex as this is a function that uses variables from another script, but here's the snippet I'm working on:

def gradients(ys, xs, graph, grad_ys=None, **kwargs):
    # Serialize graph for use within this function
    g = gde.Graph(graph.as_graph_def())
    xs_g = []
    for x in xs:
        xs_g.append(gde.Node(x, x.name, x.op, g=g))
    ys_g = gde.Node(ys, ys.name, ys.op, g=g)
    # Get a list of forward and backward operations
    ops_list = gde.make_list_of_op(g, allow_graph=True)
    back_ops = gde.get_backward_walk_ops(ys_g,
                                         inclusive=True)
    debug_print("back_ops: %s", back_ops)
    fwd_ops = gde.get_forward_walk_ops(xs_g,
                                       inclusive=True,
                                       within_ops=back_ops)

And here's where the function is called:

tf_g = tf.Graph()
with tf_g.as_default():
        args = parser.parse_args()
        enc = encoder.get_encoder(args.model_name, models_dir=args.models_dir)
        hparams = model.default_hparams()
        with open(os.path.join('models', args.model_name, 'hparams.json')) as f:
            hparams.override_from_dict(json.load(f))

        if args.sample_length > hparams.n_ctx:
            raise ValueError(
                "Can't get samples longer than window size: %s" % hparams.n_ctx)

        with tf.Session() as sess:
            # Fully static shape required to make memory accounting in
            # twremat accurate.
            train_context = tf.placeholder(tf.int32, [args.batch_size, 1024])
            train_context_in = randomize(train_context, hparams, args.noise)
            train_output = model.model(hparams=hparams, X=train_context_in)
            train_loss = tf.reduce_mean(
                tf.nn.sparse_softmax_cross_entropy_with_logits(
                    labels=train_context[:, 1:], logits=train_output['logits'][:, :-1]))

            if args.val_every > 0:
                val_context = tf.placeholder(tf.int32, [args.val_batch_size, None])
                val_output = model.model(hparams=hparams, X=val_context)
                val_loss = tf.reduce_mean(
                    tf.nn.sparse_softmax_cross_entropy_with_logits(
                        labels=val_context[:, 1:], logits=val_output['logits'][:, :-1]))
                val_loss_summary = tf.summary.scalar('val_loss', val_loss)

            sample_context = tf.placeholder(tf.int32, [args.batch_size, None])
            tf_sample = sample.sample_sequence(
                hparams=hparams,
                length=args.sample_length,
                context=sample_context,
                batch_size=args.batch_size,
                temperature=1.0,
                top_k=args.top_k,
                top_p=args.top_p)

            all_vars = [v for v in tf.trainable_variables() if 'model' in v.name]
            train_vars = [v for v in all_vars if '/h' in v.name] if args.only_train_transformer_layers else all_vars
            opt_grads = gradients(train_loss, train_vars, tf_g)

NicholasMcElroy avatar Jun 25 '21 18:06 NicholasMcElroy

Sorry, I'm still having trouble reproducing this. Could you provide a stack trace so I can see which of the calls from get_forward_walk_ops() to get_unique_graph() is triggering this error?

frreiss avatar Jul 01 '21 18:07 frreiss

I've been messing around with it a bit so the error I'm getting now is a little different, but here's the stack trace of what I'm getting now:

Traceback (most recent call last):
  File "./traintest.py", line 325, in <module>
    main()
  File "./traintest.py", line 146, in main
    opt_grads = tensorgrader.gradients(train_loss, train_vars, tf_g)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py", line 889, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py", line 933, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py", line 764, in _initialize
    *args, **kwds))
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 3050, in _get_concrete_function_internal_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 3444, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 3289, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py", line 999, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py", line 672, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/func_graph.py", line 986, in wrapper
    raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

    /content/drive/MyDrive/nlp/tensorgrader.py:30 gradients  *
        fwd_ops = gde.get_forward_walk_ops(xs_n,
    /usr/local/lib/python3.7/dist-packages/graph_def_editor/select.py:466 get_forward_walk_ops  *
        for new_t in op.outputs:
    /usr/local/lib/python3.7/dist-packages/graph_def_editor/node.py:170 outputs
        raise ValueError("Outputs of {} have not been set".format(self))

    ValueError: Outputs of Node[<bound method BaseResourceVariable.value of <tf.Variable 'model/h11/attn/c_attn/w:0' shape=(1, 768, 2304) dtype=float32>>|name: "model/h11/attn/c_attn/w"
    op: "VarHandleOp"
    attr {
      key: "_class"
      value {
        list {
          s: "loc:@model/h11/attn/c_attn/w"
        }
      }
    }
    attr {
      key: "allowed_devices"
      value {
        list {
        }
      }
    }
    attr {
      key: "container"
      value {
        s: ""
      }
    }
    attr {
      key: "dtype"
      value {
        type: DT_FLOAT
      }
    }
    attr {
      key: "shape"
      value {
        shape {
          dim {
            size: 1
          }
          dim {
            size: 768
          }
          dim {
            size: 2304
          }
        }
      }
    }
    attr {
      key: "shared_name"
      value {
        s: "model/h11/attn/c_attn/w"
      }
    }
    ] have not been set

NicholasMcElroy avatar Jul 05 '21 20:07 NicholasMcElroy

Sorry for the delay in getting back to this.

The most recent stack trace seems to indicate that there's a problem in the conversion from protocol buffers to Node and Graph objects. I've added some defensive type checking code to the Node class's constructor that will hopefully catch the problem closer to its root cause. The code is currently in this branch: https://github.com/frreiss/graph_def_editor_fred/tree/node-type-check

Could you try running your program against the code in that branch and seeing what error results?

frreiss avatar Jul 30 '21 22:07 frreiss