datasets
datasets copied to clipboard
Error processing scalar columns using tensorflow.
datasets==4.0.0
columns_to_return = ['input_ids','attention_mask', 'start_positions', 'end_positions']
train_ds.set_format(type='tf', columns=columns_to_return)
train_ds:
train_ds type: <class 'datasets.arrow_dataset.Dataset'>, shape: (1000, 9)
columns: ['question', 'sentences', 'answer', 'str_idx', 'end_idx', 'input_ids', 'attention_mask', 'start_positions', 'end_positions']
features:{'question': Value('string'), 'sentences': Value('string'), 'answer': Value('string'), 'str_idx': Value('int64'), 'end_idx': Value('int64'), 'input_ids': List(Value('int32')), 'attention_mask': List(Value('int8')), 'start_positions': Value('int64'), 'end_positions': Value('int64')}
train_ds_tensor = train_ds['start_positions'].to_tensor(shape=(-1,1)) hits the following error:
AttributeError: 'Column' object has no attribute 'to_tensor'
tf.reshape(train_ds['start_positions'], shape=[-1,1]) hits the following error:
TypeError: Scalar tensor has no `len()`
Using tf.convert_to_tensor works fine:
import tensorflow as tf
start_pos = tf.convert_to_tensor(train_ds['start_positions'], dtype=tf.int64)
start_pos = tf.reshape(start_pos, [-1, 1])
Alternatively, using the built-in to_tf_dataset also avoids the issue:
train_tf = train_ds.to_tf_dataset(
columns=['input_ids','attention_mask'],
label_cols=['start_positions','end_positions'],
shuffle=True,
batch_size=32
)
start_pos = tf.convert_to_tensor(self._train_ds['start_positions'], dtype=tf.int64)
File "/home/khteh/.local/share/virtualenvs/pAIthon-GaqEDHQT/lib/python3.13/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/khteh/.local/share/virtualenvs/pAIthon-GaqEDHQT/lib/python3.13/site-packages/tensorflow/python/framework/constant_op.py", line 108, in convert_to_eager_tensor
return ops.EagerTensor(value, ctx.device_name, dtype)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: TypeError: Scalar tensor has no `len()`
Traceback (most recent call last):
File "/home/khteh/.local/share/virtualenvs/pAIthon-GaqEDHQT/lib/python3.13/site-packages/tensorflow/python/framework/ops.py", line 361, in __len__
raise TypeError("Scalar tensor has no `len()`")
TypeError: Scalar tensor has no `len()`
to_tf_dataset works perfectly.