super-gradients icon indicating copy to clipboard operation
super-gradients copied to clipboard

'Trainer' object has no attribute 'train_loader' error

Open marvlyngkhoi opened this issue 1 year ago • 16 comments

I'm Trying to train yolo-nas on a custom dataset and I get the below error while running the training on google colab using

trainer.train(model=model, 
              training_params=train_params, 
              train_loader=train_data, 
              valid_loader=val_data)

image

marvlyngkhoi avatar May 10 '23 08:05 marvlyngkhoi

Hi @fleventy-5 I think that the train_loader you passed is not a Dataloader but instead either None, [], False, or any "0" value. Can you please check it ?

Louis-Dupont avatar May 10 '23 11:05 Louis-Dupont

My version of super-gradients is 3.1.0

This is part of my code. train_data

train_data is like Dataloader. But I get an error "'Trainer' object has no attribute 'train_loader'"

No error occurs in YOLONAS Starter Notebook.

I get an error in my local environment. Why is that?

Tried and tested code trainer.train( model=model, train_params=train_params, train_loader=train_data, valid_loader=val_data )

I'll put up some of my code now. one_part

momonoki3nenn avatar May 11 '23 01:05 momonoki3nenn

@momonoki3nenn, @fleventy-5 , I did not manage to reproduce it, but we pushed a change that should fix it. It will be in the next release, but meanwhile, you can install SG from our repo directly :)

pip install git+https://github.com/Deci-AI/super-gradients

Louis-Dupont avatar May 11 '23 13:05 Louis-Dupont

@Louis-Dupont Thanks for responding! I installed it and tried it out. Error "'Trainer' object has no attribute 'train_loader'" no longer occurs. But I got the following error in my environment. error

I wonder if sg_trainer.py", line 1211 is the cause.

I updated the Python version from 3.9.13 to 3.10.11 and tried again with the same results.

My PC GPU is RTX-3080 with 40GB memory. Is it difficult to run the YOLO-NAS train in my environment?

momonoki3nenn avatar May 12 '23 02:05 momonoki3nenn

@momonoki3nenn, @fleventy-5 , I did not manage to reproduce it, but we pushed a change that should fix it. It will be in the next release, but meanwhile, you can install SG from our repo directly :)

pip install git+https://github.com/Deci-AI/super-gradients

@Louis-Dupont I'm able to train now using custom dataset on colab notebooks Thanks

marvlyngkhoi avatar May 12 '23 07:05 marvlyngkhoi

@momonoki3nenn The StopIteration appears because we are trying to iterate over a Dataloader that is apparently empty (you can try next(iter(train_data)) and you will get the same error)

Another thing that supports this hypothesis is the fact that on the right of the "Caching annotation" line, there is "1/1" written, which shows that your dataset doesn't include multiple images/labels.

So now the question is why is the dataloader empty ? I would guess that either you dont point to the right path of your dataset or maybe your dataset doesnt have the right structrure. Feel free to check this documentation page to see how your data should be structured to use this (or another) dataset. To test it simply, you can check the length or iterate over it len(train_data) or next(iter(train_data)).

Louis-Dupont avatar May 12 '23 17:05 Louis-Dupont

Is there any fix found for the StopIteration issue? I got the same error

Vikram12301 avatar May 13 '23 11:05 Vikram12301

@Louis-Dupont StopIteration occurred when both train and val data were one by one. When multiple train and val data were used, StopIteration did not occur. There are times when I want to learn a single piece of data. I hope you can handle this in the future!

As the story goes, an error occurred when I made the data multiple. Caching annotations: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 23/23 [00:00<00:00, 2874.87it/s] Caching annotations: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 2628.01it/s] Train epoch 0: 0%| | 0/2 [00:19<?, ?it/s] [2023-05-13 18:30:35] ERROR - sg_trainer_utils.py - Uncaught exception Traceback (most recent call last): File "C:\Users\zynas\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\zynas\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "c:\Users\zynas\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\__main__.py", line 39, in <module> cli.main() File "c:\Users\zynas\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 430, in main run() File "c:\Users\zynas\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 284, in run_file runpy.run_path(target, run_name="__main__") File "c:\Users\zynas\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 321, in run_path return _run_module_code(code, init_globals, run_name, File "c:\Users\zynas\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 135, in _run_module_code _run_code(code, mod_globals, init_globals, File "c:\Users\zynas\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 124, in _run_code exec(code, run_globals) File "C:\projects\evaluate_yolo_nas\train.py", line 62, in <module> trainer.train( File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\super_gradients\training\sg_trainer\sg_trainer.py", line 1247, in train train_metrics_tuple = self._train_epoch(epoch=epoch, silent_mode=silent_mode) File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\super_gradients\training\sg_trainer\sg_trainer.py", line 442, in _train_epoch loss, loss_log_items = self._get_losses(outputs, targets) File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\super_gradients\training\sg_trainer\sg_trainer.py", line 475, in _get_losses loss = self.criterion(outputs, targets) File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\super_gradients\training\losses\yolox_loss.py", line 155, in forward return self._compute_loss(predictions, targets) File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\super_gradients\training\losses\yolox_loss.py", line 181, in _compute_loss x_shifts, y_shifts, expanded_strides, transformed_outputs, raw_outputs = self.prepare_predictions(predictions) File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\super_gradients\training\losses\yolox_loss.py", line 335, in prepare_predictions batch_size, num_anchors, h, w, num_outputs = output.shape ValueError: not enough values to unpack (expected 5, got 3)

I have verified that the correct path is specified. The far mat of the text file to be learned is the object (class) number The object center X coordinate The object center Y coordinate The object width The object height. (I use LabelImg for annotation) Is there a problem with the train data...

momonoki3nenn avatar May 13 '23 12:05 momonoki3nenn

@Vikram12301 , did you install the nightly ?

pip install git+https://github.com/Deci-AI/super-gradients

If yes, do you have a single sample, in your train set? Please also provide the full snippet of code you are using to run, with among other dataset instantiation.

Louis-Dupont avatar May 14 '23 13:05 Louis-Dupont

@momonoki3nenn yeah it looks like. Did you try with batch_size=1 ? My guess is that maybe you have batch_size > len(dataset), which means that the dataloader cannot prepare any full batch.

Concerning your other error, it comes from the code of YoloXDetectionLoss but YoloNAS expects PPYoloELoss (like in the notebooks) If you are still working on YoloNAS, this could definitely explain your error. In that case, change like in the notebook to use PPYoloELoss. If not, could you please share with me your code snippet with all of the code?

Louis-Dupont avatar May 14 '23 13:05 Louis-Dupont

@Louis-Dupont I tried two patterns and got errors in both patterns. However, for one of the patterns, after modifying it, the error did not occur and it seems to have been learned.

  • Pattern 1 Use coco2017_yolox_train_params for train_params Change ’loss’ to PPYoloELoss Errors encountered: prepare_predictions batch_size, num_anchors, h, w, num_outputs = output.shape ValueError: not enough values to unpack (expected 5, got 3)

  • Pattern 2 Use Training Parameters for train_params(All same) Errors encountered:UnicodeEncodeError: 'cp932' codec can't encode character '\u2198' in position 112: illegal multibyte sequence Traceback (most recent call last): File "C:\Users\zynas\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\zynas\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "c:\Users\zynas\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\__main__.py", line 39, in <module> cli.main() File "c:\Users\zynas\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 430, in main run() File "c:\Users\zynas\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 284, in run_file runpy.run_path(target, run_name="__main__") File "c:\Users\zynas\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 321, in run_path return _run_module_code(code, init_globals, run_name, File "c:\Users\zynas\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 135, in _run_module_code _run_code(code, mod_globals, init_globals, File "c:\Users\zynas\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 124, in _run_code exec(code, run_globals) File "C:\projects\evaluate_yolo_nas\train.py", line 99, in <module> trainer.train( File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\super_gradients\training\sg_trainer\sg_trainer.py", line 1307, in train self._validate_final_average_model(cleanup_snapshots_pkl_file=True) File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\super_gradients\training\sg_trainer\sg_trainer.py", line 1415, in _validate_final_average_model averaged_model_results_tuple = self._validate_epoch(epoch=self.max_epochs) File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\super_gradients\training\sg_trainer\sg_trainer.py", line 1766, in _validate_epoch return self.evaluate( File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\super_gradients\training\sg_trainer\sg_trainer.py", line 1878, in evaluate sg_trainer_utils.display_epoch_summary( File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\super_gradients\training\utils\sg_trainer_utils.py", line 257, in display_epoch_summary summary_tree.show() File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\treelib\tree.py", line 854, in show print(self._reader) UnicodeEncodeError: 'cp932' codec can't encode character '\u2198' in position 112: illegal multibyte sequence

What we tried:Commented out 853-856 in site-packages/treelib/tree.py. Predicts success because learning is complete.

I don't know if this modification method is correct, but we will continue to learn in a successful way for a while!

momonoki3nenn avatar May 15 '23 06:05 momonoki3nenn

@momonoki3nenn , can you try:

  • import sys; print(sys.getdefaultencoding()) to check what is your default encoding
  • setting the environment variable PYTHONIOENCODING to utf8, and then run the training again

It looks like tree doesn't always encode to utf8 on windows even though it is supposed to

Louis-Dupont avatar May 16 '23 08:05 Louis-Dupont

@Louis-Dupont Thanks for the reply. My environment defaulted to utf-8. I checked and it seems to be a bug in treelib. Reference Site def write(line): self._reader += line.decode("utf-8") + "\n" ・revision def write(line): self._reader += line + "\n"

The following error occurred after the correction. File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\super_gradients\training\utils\sg_trainer_utils.py", line 257, in display_epoch_summary summary_tree.show() File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\treelib\tree.py", line 848, in show self.__print_backend(nid, level, idhidden, filter, File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\treelib\tree.py", line 222, in __print_backend func('{0}{1}'.format(pre, label).encode('utf-8')) File "c:\projects\evaluate_yolo_nas\venv\lib\site-packages\treelib\tree.py", line 845, in write self._reader += line + "\n" TypeError: can't concat str to bytes

So fix it. func('{0}{1}'.format(pre, label).encode('utf-8')) ・revision func('{0}{1}'.format(pre, label))

Errors no longer occur. I don't know if the fix is right. What is certain is that it is not a YOLO-NAS bug.

momonoki3nenn avatar May 17 '23 04:05 momonoki3nenn

@momonoki3nenn, thanks for the investigation! I don't understand why it fails in only very specific environments, so it's a bit hard for us to fix what we don't manage to reproduce ...

We might be able to fix it in SG if we understand exactly what leads to this encoding error (even if due to treelib bad implementation of encoding).

First idea

The arrows might lead to this error. You can try to replace: https://github.com/Deci-AI/super-gradients/blob/a30fa8fdc623533df785831f7457967066fb2ebe/src/super_gradients/training/utils/sg_trainer_utils.py#L41-L50 with

 def to_symbol(self) -> str: 
     """Get the symbol representing the current increase type""" 
     if self == IncreaseType.NONE: 
         return "" 
     elif self == IncreaseType.IS_GREATER: 
         return "[UP]" 
     elif self == IncreaseType.IS_SMALLER: 
         return "[DOWN]" 
     else: 
         return "=" 

Second idea

The colored() function might lead to this error. You can try to replace: https://github.com/Deci-AI/super-gradients/blob/a30fa8fdc623533df785831f7457967066fb2ebe/src/super_gradients/training/utils/sg_trainer_utils.py#L231-L237

diff_with_prev_colored = f"{monitored_value.has_increased_from_previous.to_symbol()} {change_from_previous}"
diff_with_best_colored = f"{monitored_value.has_increased_from_best.to_symbol()} {change_from_best}"

Third idea

My third guess is to do both at the same time.

If none of these works then it's probably just that the treelib doesnt even work with plain text in your case, which means that we need an alternative.

Louis-Dupont avatar May 18 '23 12:05 Louis-Dupont

@Louis-Dupont First idea no longer causes errors. The arrows were the cause. Thanks for letting me know.

momonoki3nenn avatar May 23 '23 00:05 momonoki3nenn