skorch
skorch copied to clipboard
IndexError: invalid index of a 0-dim tensor. Use `tensor.item()` in Python or `tensor.item()` in C++ to convert a 0-dim tensor to a number
How to solve this error?
File "c:\Users\User.virtualenvs\testNewTransformer-s5V1e_yt\lib\site-packages\skorch\utils.py", line 264, in _indexing_other return data[i]
IndexError: invalid index of a 0-dim tensor. Use tensor.item()
in Python or tensor.item()
in C++ to convert a 0-dim tensor to a number
pytorch-version==2.0.0 withcu118 torchvision==0.15.0+cu118 skorch==0.14.0
Could you please give a lot more context, otherwise we cannot answer that question. Ideally, you could provide a code sample that allows us to reproduce the error. Otherwise, it would probably help if you can describe the data you're using (types, dtypes, shapes, etc.).
The data type is a pytorch tensordataset It includes in input_ids_array and label_id_array. There are all in Huggingface AutoTokenizer generated. Because the text needs to encode by using Huggingface language model
def generate_data_set(input_examples, label_masks, label_map, do_shuffle=False, balance_label_examples=False):
'''
Generate a Dataloader given the input examples, eventually masked if they are
to be considered NOT labeled.
'''
examples = []
# Count the percentage of labeled examples
num_labeled_examples = 0
for label_mask in label_masks:
if label_mask:
num_labeled_examples += 1
label_mask_rate = num_labeled_examples/len(input_examples)
#計算有標籤資料在所有資料中的比例
# if required it applies the balance
for index, ex in enumerate(input_examples):
if label_mask_rate == 1 or not balance_label_examples:
examples.append((ex, label_masks[index]))
else:
# IT SIMULATE A LABELED EXAMPLE
if label_masks[index]:
balance = int(1/label_mask_rate)
balance = int(math.log(balance, 2))
if balance < 1:
balance = 1
for b in range(0, int(balance)):
examples.append((ex, label_masks[index]))
else:
examples.append((ex, label_masks[index]))
# -----------------------------------------------
# Generate input examples to the Transformer
# -----------------------------------------------
input_ids = []
input_mask_array = []
label_mask_array = []
label_id_array = []
# Tokenization
for (text, label_mask) in examples:
encoded_sent = tokenizer.encode(
text[0], add_special_tokens=True, max_length=max_seq_length, padding="max_length", truncation=True)
input_ids.append(encoded_sent)
label_id_array.append(label_map[text[1]])
label_mask_array.append(label_mask)
# Attention to token (to ignore padded input wordpieces)
for sent in input_ids:
att_mask = [int(token_id > 0) for token_id in sent]
input_mask_array.append(att_mask)
# Convertion to Tensor
input_ids = torch.tensor(input_ids)
input_mask_array = torch.tensor(input_mask_array)
label_id_array = torch.tensor(label_id_array, dtype=torch.long)
label_mask_array = torch.tensor(label_mask_array)
# Building the TensorDataset
dataset = TensorDataset(input_ids,label_id_array)
return dataset
generator_find_param = NeuralNetClassifier(
module=Generator1,
criterion=torch.nn.CrossEntropyLoss(ignore_index=-1),
train_split=None,
max_epochs=10,
batch_size=32,
module__noise_size=100,
module__output_size=hidden_size,
module__hidden_sizes=hidden_levels_g,
module__dropout_rate=out_dropout_rate,
)
kf = KFold(n_splits=5, shuffle=True, random_state=42)
generator_param_grid = {
'module__noise_size': [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000],
'module__dropout_rate': [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
}
grid_searchCV = GridSearchCV(
generator_find_param, generator_param_grid, cv=kf, scoring='accuracy')
**grid_searchCV.fit(x_train_dataset, y_train_dataset) /*error is here*/**
Error Message File c:\Users\User.virtualenvs\testNewTransformer-s5V1e_yt\lib\site-packages\sklearn\base.py:1151, in _fit_context..decorator..wrapper(estimator, *args, **kwargs) 1144 estimator._validate_params() 1146 with config_context( 1147 skip_parameter_validation=( 1148 prefer_skip_nested_validation or global_skip_validation 1149 ) 1150 ): -> 1151 return fit_method(estimator, *args, **kwargs)
File c:\Users\User.virtualenvs\testNewTransformer-s5V1e_yt\lib\site-packages\sklearn\model_selection_search.py:898, in BaseSearchCV.fit(self, X, y, groups, **fit_params)
892 results = self._format_results(
893 all_candidate_params, n_splits, all_out, all_more_results
894 )
896 return results
--> 898 self._run_search(evaluate_candidates)
...
return indexing(data, i)
File "c:\Users\User.virtualenvs\testNewTransformer-s5V1e_yt\lib\site-packages\skorch\utils.py", line 264, in _indexing_other
return data[i]
IndexError: invalid index of a 0-dim tensor. Use tensor.item()
in Python or tensor.item()
in C++ to convert a 0-dim tensor to a number
x_train_dataset=generate_data_set(labeled_examples,x_train_label_masks, label_map, do_shuffle = False, balance_label_examples = apply_balance)
It's still very hard to tell from your code what the exact issue is.
Could you print what the input_ids
and label_id_array
are, which you put into your TensorDataset
?
Furthermore, did you know that we support using Hugging Face tokenizers directly through HuggingFacePretrainedTokenizer? Maybe this would be a better fit for your problem. Here is a complete notebook that showcases how to use it.
OK,Thanks for your help. I will take your advice. This is a good suggestion.