datasets icon indicating copy to clipboard operation
datasets copied to clipboard

Error processing scalar columns using tensorflow.

Open khteh opened this issue 3 months ago • 2 comments

datasets==4.0.0

columns_to_return = ['input_ids','attention_mask', 'start_positions', 'end_positions']
train_ds.set_format(type='tf', columns=columns_to_return)

train_ds:

train_ds type: <class 'datasets.arrow_dataset.Dataset'>, shape: (1000, 9)
columns: ['question', 'sentences', 'answer', 'str_idx', 'end_idx', 'input_ids', 'attention_mask', 'start_positions', 'end_positions']
features:{'question': Value('string'), 'sentences': Value('string'), 'answer': Value('string'), 'str_idx': Value('int64'), 'end_idx': Value('int64'), 'input_ids': List(Value('int32')), 'attention_mask': List(Value('int8')), 'start_positions': Value('int64'), 'end_positions': Value('int64')}

train_ds_tensor = train_ds['start_positions'].to_tensor(shape=(-1,1)) hits the following error:

AttributeError: 'Column' object has no attribute 'to_tensor'

tf.reshape(train_ds['start_positions'], shape=[-1,1]) hits the following error:

TypeError: Scalar tensor has no `len()`

khteh avatar Sep 15 '25 10:09 khteh

Using tf.convert_to_tensor works fine:

import tensorflow as tf

start_pos = tf.convert_to_tensor(train_ds['start_positions'], dtype=tf.int64)
start_pos = tf.reshape(start_pos, [-1, 1])

Alternatively, using the built-in to_tf_dataset also avoids the issue:

train_tf = train_ds.to_tf_dataset(
    columns=['input_ids','attention_mask'],
    label_cols=['start_positions','end_positions'],
    shuffle=True,
    batch_size=32
)

arjunaar2789 avatar Sep 27 '25 05:09 arjunaar2789

    start_pos = tf.convert_to_tensor(self._train_ds['start_positions'], dtype=tf.int64)
  File "/home/khteh/.local/share/virtualenvs/pAIthon-GaqEDHQT/lib/python3.13/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/khteh/.local/share/virtualenvs/pAIthon-GaqEDHQT/lib/python3.13/site-packages/tensorflow/python/framework/constant_op.py", line 108, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: TypeError: Scalar tensor has no `len()`
Traceback (most recent call last):

  File "/home/khteh/.local/share/virtualenvs/pAIthon-GaqEDHQT/lib/python3.13/site-packages/tensorflow/python/framework/ops.py", line 361, in __len__
    raise TypeError("Scalar tensor has no `len()`")

TypeError: Scalar tensor has no `len()`

to_tf_dataset works perfectly.

khteh avatar Sep 27 '25 08:09 khteh