super-gradients
super-gradients copied to clipboard
'Trainer' object has no attribute 'train_loader' error
I'm Trying to train yolo-nas on a custom dataset and I get the below error while running the training on google colab using
trainer.train(model=model,
training_params=train_params,
train_loader=train_data,
valid_loader=val_data)
Hi @fleventy-5
I think that the train_loader
you passed is not a Dataloader
but instead either None
, []
, False
, or any "0" value.
Can you please check it ?
My version of super-gradients is 3.1.0
This is part of my code.
train_data is like Dataloader.
But I get an error "'Trainer' object has no attribute 'train_loader'"
No error occurs in YOLONAS Starter Notebook.
I get an error in my local environment. Why is that?
Tried and tested code
trainer.train( model=model, train_params=train_params, train_loader=train_data, valid_loader=val_data )
I'll put up some of my code now.
@momonoki3nenn, @fleventy-5 , I did not manage to reproduce it, but we pushed a change that should fix it. It will be in the next release, but meanwhile, you can install SG from our repo directly :)
pip install git+https://github.com/Deci-AI/super-gradients
@Louis-Dupont
Thanks for responding! I installed it and tried it out.
Error "'Trainer' object has no attribute 'train_loader'"
no longer occurs.
But I got the following error in my environment.
I wonder if sg_trainer.py", line 1211 is the cause.
I updated the Python version from 3.9.13 to 3.10.11 and tried again with the same results.
My PC GPU is RTX-3080 with 40GB memory. Is it difficult to run the YOLO-NAS train in my environment?
@momonoki3nenn, @fleventy-5 , I did not manage to reproduce it, but we pushed a change that should fix it. It will be in the next release, but meanwhile, you can install SG from our repo directly :)
pip install git+https://github.com/Deci-AI/super-gradients
@Louis-Dupont I'm able to train now using custom dataset on colab notebooks Thanks
@momonoki3nenn
The StopIteration
appears because we are trying to iterate over a Dataloader that is apparently empty (you can try next(iter(train_data))
and you will get the same error)
Another thing that supports this hypothesis is the fact that on the right of the "Caching annotation" line, there is "1/1" written, which shows that your dataset doesn't include multiple images/labels.
So now the question is why is the dataloader empty ?
I would guess that either you dont point to the right path of your dataset or maybe your dataset doesnt have the right structrure.
Feel free to check this documentation page to see how your data should be structured to use this (or another) dataset.
To test it simply, you can check the length or iterate over it len(train_data)
or next(iter(train_data))
.
Is there any fix found for the StopIteration issue? I got the same error
@Louis-Dupont
StopIteration
occurred when both train and val data were one by one.
When multiple train and val data were used, StopIteration
did not occur.
There are times when I want to learn a single piece of data. I hope you can handle this in the future!
As the story goes, an error occurred when I made the data multiple.
Caching annotations: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 23/23 [00:00<00:00, 2874.87it/s] Caching annotations: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 2628.01it/s] Train epoch 0: 0%| | 0/2 [00:19<?, ?it/s] [2023-05-13 18:30:35] ERROR - sg_trainer_utils.py - Uncaught exception Traceback (most recent call last): File "C:\Users\zynas\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\zynas\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "c:\Users\zynas\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\__main__.py", line 39, in <module> cli.main() File "c:\Users\zynas\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 430, in main run() File "c:\Users\zynas\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 284, in run_file runpy.run_path(target, run_name="__main__") File "c:\Users\zynas\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 321, in run_path return _run_module_code(code, init_globals, run_name, File "c:\Users\zynas\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 135, in _run_module_code _run_code(code, mod_globals, init_globals, File "c:\Users\zynas\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 124, in _run_code exec(code, run_globals) File "C:\projects\evaluate_yolo_nas\train.py", line 62, in <module> trainer.train( File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\super_gradients\training\sg_trainer\sg_trainer.py", line 1247, in train train_metrics_tuple = self._train_epoch(epoch=epoch, silent_mode=silent_mode) File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\super_gradients\training\sg_trainer\sg_trainer.py", line 442, in _train_epoch loss, loss_log_items = self._get_losses(outputs, targets) File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\super_gradients\training\sg_trainer\sg_trainer.py", line 475, in _get_losses loss = self.criterion(outputs, targets) File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\super_gradients\training\losses\yolox_loss.py", line 155, in forward return self._compute_loss(predictions, targets) File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\super_gradients\training\losses\yolox_loss.py", line 181, in _compute_loss x_shifts, y_shifts, expanded_strides, transformed_outputs, raw_outputs = self.prepare_predictions(predictions) File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\super_gradients\training\losses\yolox_loss.py", line 335, in prepare_predictions batch_size, num_anchors, h, w, num_outputs = output.shape ValueError: not enough values to unpack (expected 5, got 3)
I have verified that the correct path is specified. The far mat of the text file to be learned is the object (class) number The object center X coordinate The object center Y coordinate The object width The object height. (I use LabelImg for annotation) Is there a problem with the train data...
@Vikram12301 , did you install the nightly ?
pip install git+https://github.com/Deci-AI/super-gradients
If yes, do you have a single sample, in your train set? Please also provide the full snippet of code you are using to run, with among other dataset instantiation.
@momonoki3nenn yeah it looks like. Did you try with batch_size=1
? My guess is that maybe you have batch_size
> len(dataset)
, which means that the dataloader cannot prepare any full batch.
Concerning your other error, it comes from the code of YoloXDetectionLoss
but YoloNAS expects PPYoloELoss
(like in the notebooks)
If you are still working on YoloNAS, this could definitely explain your error. In that case, change like in the notebook to use PPYoloELoss
.
If not, could you please share with me your code snippet with all of the code?
@Louis-Dupont I tried two patterns and got errors in both patterns. However, for one of the patterns, after modifying it, the error did not occur and it seems to have been learned.
-
Pattern 1 Use coco2017_yolox_train_params for train_params Change ’loss’ to PPYoloELoss Errors encountered:
prepare_predictions batch_size, num_anchors, h, w, num_outputs = output.shape ValueError: not enough values to unpack (expected 5, got 3)
-
Pattern 2 Use Training Parameters for train_params(All same) Errors encountered:
UnicodeEncodeError: 'cp932' codec can't encode character '\u2198' in position 112: illegal multibyte sequence Traceback (most recent call last): File "C:\Users\zynas\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\zynas\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "c:\Users\zynas\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\__main__.py", line 39, in <module> cli.main() File "c:\Users\zynas\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 430, in main run() File "c:\Users\zynas\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 284, in run_file runpy.run_path(target, run_name="__main__") File "c:\Users\zynas\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 321, in run_path return _run_module_code(code, init_globals, run_name, File "c:\Users\zynas\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 135, in _run_module_code _run_code(code, mod_globals, init_globals, File "c:\Users\zynas\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 124, in _run_code exec(code, run_globals) File "C:\projects\evaluate_yolo_nas\train.py", line 99, in <module> trainer.train( File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\super_gradients\training\sg_trainer\sg_trainer.py", line 1307, in train self._validate_final_average_model(cleanup_snapshots_pkl_file=True) File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\super_gradients\training\sg_trainer\sg_trainer.py", line 1415, in _validate_final_average_model averaged_model_results_tuple = self._validate_epoch(epoch=self.max_epochs) File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\super_gradients\training\sg_trainer\sg_trainer.py", line 1766, in _validate_epoch return self.evaluate( File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\super_gradients\training\sg_trainer\sg_trainer.py", line 1878, in evaluate sg_trainer_utils.display_epoch_summary( File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\super_gradients\training\utils\sg_trainer_utils.py", line 257, in display_epoch_summary summary_tree.show() File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\treelib\tree.py", line 854, in show print(self._reader) UnicodeEncodeError: 'cp932' codec can't encode character '\u2198' in position 112: illegal multibyte sequence
What we tried:Commented out 853-856 in site-packages/treelib/tree.py. Predicts success because learning is complete.
I don't know if this modification method is correct, but we will continue to learn in a successful way for a while!
@momonoki3nenn , can you try:
-
import sys; print(sys.getdefaultencoding())
to check what is your default encoding - setting the environment variable
PYTHONIOENCODING
toutf8
, and then run the training again
It looks like tree
doesn't always encode to utf8
on windows even though it is supposed to
@Louis-Dupont
Thanks for the reply.
My environment defaulted to utf-8.
I checked and it seems to be a bug in treelib.
Reference Site
def write(line): self._reader += line.decode("utf-8") + "\n"
・revision
def write(line): self._reader += line + "\n"
The following error occurred after the correction.
File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\super_gradients\training\utils\sg_trainer_utils.py", line 257, in display_epoch_summary summary_tree.show() File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\treelib\tree.py", line 848, in show self.__print_backend(nid, level, idhidden, filter, File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\treelib\tree.py", line 222, in __print_backend func('{0}{1}'.format(pre, label).encode('utf-8')) File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\treelib\tree.py", line 845, in write self._reader += line + "\n" TypeError: can't concat str to bytes
So fix it.
func('{0}{1}'.format(pre, label).encode('utf-8'))
・revision
func('{0}{1}'.format(pre, label))
Errors no longer occur. I don't know if the fix is right. What is certain is that it is not a YOLO-NAS bug.
@momonoki3nenn, thanks for the investigation! I don't understand why it fails in only very specific environments, so it's a bit hard for us to fix what we don't manage to reproduce ...
We might be able to fix it in SG if we understand exactly what leads to this encoding error (even if due to treelib bad implementation of encoding).
First idea
The arrows might lead to this error. You can try to replace: https://github.com/Deci-AI/super-gradients/blob/a30fa8fdc623533df785831f7457967066fb2ebe/src/super_gradients/training/utils/sg_trainer_utils.py#L41-L50 with
def to_symbol(self) -> str:
"""Get the symbol representing the current increase type"""
if self == IncreaseType.NONE:
return ""
elif self == IncreaseType.IS_GREATER:
return "[UP]"
elif self == IncreaseType.IS_SMALLER:
return "[DOWN]"
else:
return "="
Second idea
The colored()
function might lead to this error. You can try to replace:
https://github.com/Deci-AI/super-gradients/blob/a30fa8fdc623533df785831f7457967066fb2ebe/src/super_gradients/training/utils/sg_trainer_utils.py#L231-L237
diff_with_prev_colored = f"{monitored_value.has_increased_from_previous.to_symbol()} {change_from_previous}"
diff_with_best_colored = f"{monitored_value.has_increased_from_best.to_symbol()} {change_from_best}"
Third idea
My third guess is to do both at the same time.
If none of these works then it's probably just that the treelib doesnt even work with plain text in your case, which means that we need an alternative.
@Louis-Dupont First idea no longer causes errors. The arrows were the cause. Thanks for letting me know.