blog icon indicating copy to clipboard operation
blog copied to clipboard

NameError: name "DatasetDict" is not defined

Open jxu0510 opened this issue 1 year ago • 5 comments

Hi, thanks to the authors for this amazing work! I would really appreciate if you can provide any help on the error I encountered. I was following the trainer part in https://www.sbert.net/docs/sentence_transformer/training_overview.html, but when I run

trainer = SentenceTransformerTrainer( model=model, args=args, train_dataset=train_dataset, eval_dataset=eval_dataset, loss=loss, evaluator=dev_evaluator, )

I got an error saying: "NameError: name 'DatasetDict' is not defined," even though I made sure DatasetDict is imported. I also searched up, but didn't find any useful information.

Thank you for your time and help in advance!

jxu0510 avatar Jul 24 '24 02:07 jxu0510

cc @tomaarsen, not sure if this refers to the blog or the documentation.

pcuenca avatar Jul 25 '24 07:07 pcuenca

I get the same issue following the same tutorial. The error comes from:

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\sentence_transformers\model_card.py:508, in SentenceTransformerModelCardData.infer_datasets(self, dataset, dataset_name)
    [505]() def infer_datasets(
    [506]()     self, dataset: Union["Dataset", "DatasetDict"], dataset_name: Optional[str] = None
    [507]() ) -> List[Dict[str, str]]:
--> [508]()     if isinstance(dataset, DatasetDict):
    [509]()         return [
    [510]()             dataset
    [511]()             for dataset_name, sub_dataset in dataset.items()
    [512]()             for dataset in self.infer_datasets(sub_dataset, dataset_name=dataset_name)
    [513]()         ]
    [515]()     def subtuple_finder(tuple: Tuple[str], subtuple: Tuple[str]) -> int:

Looks like probably a version mismatch. Will update this comment if i find a workaround

mxbi avatar Sep 09 '24 12:09 mxbi

Hmm, that's quite odd. Do you have datasets installed? pip show datasets/pip install datasets?

  • Tom Aarsen

tomaarsen avatar Sep 09 '24 14:09 tomaarsen

@tomaarsen I think it's because I installed datasets (following a previous error message which asked me to) during the current notebook session - meaning that some old version probably got cached during import and thus DatasetDict was not available. I suspect the same happened to OP.

Fixed by restarting the kernel 🙂

mxbi avatar Sep 09 '24 15:09 mxbi

Glad to hear that!

tomaarsen avatar Sep 09 '24 15:09 tomaarsen