transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Documentation example error for Train a TensorFlow model with Keras

Open lexipalmer13 opened this issue 2 years ago • 1 comments

System Info

  • transformers version: 4.25.1
  • Platform: macOS-13.1-arm64-arm-64bit
  • Python version: 3.10.8
  • Huggingface_hub version: 0.11.1
  • PyTorch version (GPU?): not installed (NA)
  • Tensorflow version (GPU?): 2.11.0 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: <No>
  • Using distributed or parallel set-up in script?: <No>

Note: I'm using tensorflow-metal since I'm running on an M1 chip

Who can help?

No response

Information

  • [X] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

I tried both versions of the documentation code; the produce the same error.

Version 1:

from datasets import load_dataset

dataset = load_dataset("glue", "cola")

dataset = dataset["train"]

from transformers import AutoTokenizer
import numpy as np

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
tokenized_data = tokenizer(dataset["sentence"], return_tensors="np", padding=True)
labels = np.array(dataset["label"]) 

from transformers import TFAutoModelForSequenceClassification
from tensorflow.keras.optimizers import Adam

model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased")
model.compile(optimizer=Adam(3e-5))
tokenized_data = dict(tokenized_data)
model.fit(tokenized_data, labels)

Version 2:

from datasets import load_dataset
dataset = load_dataset("glue", "cola")
dataset = dataset["train"]

from transformers import AutoTokenizer
import numpy as np

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")

def tokenize_dataset(data):
    # Keys of the returned dictionary will be added to the dataset as columns
    return tokenizer(data["sentence"])

dataset = dataset.map(tokenize_dataset)
tf_dataset = model.prepare_tf_dataset(dataset, batch_size=16, shuffle=True, tokenizer=tokenizer)

from transformers import TFAutoModelForSequenceClassification
from tensorflow.keras.optimizers import Adam

model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased")

model.compile(optimizer=Adam(3e-5))
model.fit(tf_dataset)

Expected behavior

Every line works until the final one which produces an error. I would expect the model to be fit.

lexipalmer13 avatar Jan 25 '23 21:01 lexipalmer13

cc @Rocketknight1

sgugger avatar Jan 26 '23 15:01 sgugger

Hi @lexipalmer13 - that code runs fine for me locally, but we did have a lot of compatibility issues with TF 2.11. Version 4.26, which we released two days ago, should fix those issues. Can you try running pip install --upgrade transformers to see if it works for you with the newest version?

Rocketknight1 avatar Jan 27 '23 18:01 Rocketknight1

Hi @Rocketknight1 - thanks so much for getting back to me! It continues to throw the same error even with the updated transformers. I put the error below (again it's only the model.fit that's causing me issues so the initial packages/model loading/pre-processing is all running). It seems the main issue is this NotFoundError: Graph execution error:


2023-01-27 13:41:30.016811: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2023-01-27 13:41:39.700272: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2023-01-27 13:41:43.449299: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x127acaa60
2023-01-27 13:41:43.449332: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x127acaa60
....repeats a bunch of times
2023-01-27 13:41:47.628274: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x127acaa60
---------------------------------------------------------------------------
NotFoundError                             Traceback (most recent call last)
Cell In[19], line 1
----> 1 model.fit(tokenized_data, labels)

File ~/miniconda/lib/python3.10/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     67     filtered_tb = _process_traceback_frames(e.__traceback__)
     68     # To get the full stack trace, call:
     69     # `tf.debugging.disable_traceback_filtering()`
---> 70     raise e.with_traceback(filtered_tb) from None
     71 finally:
     72     del filtered_tb

File ~/miniconda/lib/python3.10/site-packages/tensorflow/python/eager/execute.py:52, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     50 try:
     51   ctx.ensure_initialized()
---> 52   tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     53                                       inputs, attrs, num_outputs)
     54 except core._NotOkStatusException as e:
     55   if name is not None:

NotFoundError: Graph execution error:

Detected at node 'StatefulPartitionedCall_199' defined at (most recent call last):
    File "/Users/lexipalmer/miniconda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
      return _run_code(code, main_globals, None,
    File "/Users/lexipalmer/miniconda/lib/python3.10/runpy.py", line 86, in _run_code
      exec(code, run_globals)
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/ipykernel_launcher.py", line 17, in <module>
      app.launch_new_instance()
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/traitlets/config/application.py", line 1041, in launch_instance
      app.start()
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 724, in start
      self.io_loop.start()
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/tornado/platform/asyncio.py", line 215, in start
      self.asyncio_loop.run_forever()
    File "/Users/lexipalmer/miniconda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
      self._run_once()
    File "/Users/lexipalmer/miniconda/lib/python3.10/asyncio/base_events.py", line 1899, in _run_once
      handle._run()
    File "/Users/lexipalmer/miniconda/lib/python3.10/asyncio/events.py", line 80, in _run
      self._context.run(self._callback, *self._args)
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/ipykernel/kernelbase.py", line 512, in dispatch_queue
      await self.process_one()
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/ipykernel/kernelbase.py", line 501, in process_one
      await dispatch(*args)
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/ipykernel/kernelbase.py", line 408, in dispatch_shell
      await result
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/ipykernel/kernelbase.py", line 731, in execute_request
      reply_content = await reply_content
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/ipykernel/ipkernel.py", line 417, in do_execute
      res = shell.run_cell(
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/ipykernel/zmqshell.py", line 540, in run_cell
      return super().run_cell(*args, **kwargs)
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 2945, in run_cell
      result = self._run_cell(
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3000, in _run_cell
      return runner(coro)
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
      coro.send(None)
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3203, in run_cell_async
      has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3382, in run_ast_nodes
      if await self.run_code(code, result, async_=asy):
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3442, in run_code
      exec(code_obj, self.user_global_ns, self.user_ns)
    File "/var/folders/ny/h_bygvy53h16kd57z4lsmsvh0000gn/T/ipykernel_6697/3344439326.py", line 1, in <module>
      model.fit(tokenized_data, labels)
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1650, in fit
      tmp_logs = self.train_function(iterator)
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function
      return step_function(self, iterator)
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1233, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step
      outputs = model.train_step(data)
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/transformers/modeling_tf_utils.py", line 1572, in train_step
      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
      self.apply_gradients(grads_and_vars)
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
      return super().apply_gradients(grads_and_vars, name=name)
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients
      iteration = self._internal_apply_gradients(grads_and_vars)
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
      return tf.__internal__.distribute.interim.maybe_merge_call(
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
      distribution.extended.update(
    File "/Users/lexipalmer/miniconda/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
      return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'StatefulPartitionedCall_199'
could not find registered platform with id: 0x127acaa60
	 [[{{node StatefulPartitionedCall_199}}]] [Op:__inference_train_function_34674]

lexipalmer13 avatar Jan 27 '23 18:01 lexipalmer13

Hi @lexipalmer13, thanks for the error traceback! I believe this error isn't related to transformers after all - the issue is an incompatibility specifically triggered by using XLA on TF 2.11 with Apple's M1's silicon. You can see a thread detailing the issue here.

The underlying cause is that TensorFlow moved to a new optimizer format in TF 2.11. This was the cause of the compatibility issues we experienced with transformers as well. The new optimizer format automatically compiles the update step with XLA, triggering the bug. As a workaround for now, you can replace the line

from tensorflow.keras.optimizers import Adam

with

from tensorflow.keras.optimizers.legacy import Adam

Hopefully this issue will be resolved in TF soon, and you won't need this workaround anymore!

Rocketknight1 avatar Jan 30 '23 14:01 Rocketknight1

Hi @Rocketknight1 Yes, that fixed it! Thanks so much for your help!

lexipalmer13 avatar Jan 30 '23 16:01 lexipalmer13