Can't finish training, can't load model after it finished tranining
Search before asking
- [X] I have searched the HUB issues and found no similar bug report.
HUB Component
Models, Training
Bug
I trained my model using Collab and after it finished the model in the hub says 100% but that training hasn't finish. When I try to run training again on Collab to maybe trigger completion once more but when I do so it raises and exception and it can't run.
Ultralytics HUB: New authentication successful ✅
Ultralytics HUB: View model at https://hub.ultralytics.com/models/q38rJZFi6qwbaiJpRL6K 🚀
Found https://storage.googleapis.com/ultralytics-hub.appspot.com/users/n0Mwq1AC3KVneaklMrauVkIsozJ3/models/q38rJZFi6qwbaiJpRL6K/epoch-291.pt locally at weights/epoch-291.pt
---------------------------------------------------------------------------
UnpicklingError Traceback (most recent call last)
[<ipython-input-3-252a0c1dfed1>](https://localhost:8080/#) in <cell line: 3>()
1 hub.login('...')
2
----> 3 model = YOLO('https://hub.ultralytics.com/models/q38rJZFi6qwbaiJpRL6K')
4 results = model.train()
7 frames
[/usr/local/lib/python3.10/dist-packages/ultralytics/models/yolo/model.py](https://localhost:8080/#) in __init__(self, model, task, verbose)
21 else:
22 # Continue with default YOLO initialization
---> 23 super().__init__(model=model, task=task, verbose=verbose)
24
25 @property
[/usr/local/lib/python3.10/dist-packages/ultralytics/engine/model.py](https://localhost:8080/#) in __init__(self, model, task, verbose)
140 self._new(model, task=task, verbose=verbose)
141 else:
--> 142 self._load(model, task=task)
143
144 def __call__(
[/usr/local/lib/python3.10/dist-packages/ultralytics/engine/model.py](https://localhost:8080/#) in _load(self, weights, task)
292
293 if Path(weights).suffix == ".pt":
--> 294 self.model, self.ckpt = attempt_load_one_weight(weights)
295 self.task = self.model.args["task"]
296 self.overrides = self.model.args = self._reset_ckpt_args(self.model.args)
[/usr/local/lib/python3.10/dist-packages/ultralytics/nn/tasks.py](https://localhost:8080/#) in attempt_load_one_weight(weight, device, inplace, fuse)
853 def attempt_load_one_weight(weight, device=None, inplace=True, fuse=False):
854 """Loads a single model weights."""
--> 855 ckpt, weight = torch_safe_load(weight) # load ckpt
856 args = {**DEFAULT_CFG_DICT, **(ckpt.get("train_args", {}))} # combine model and default args, preferring model args
857 model = (ckpt.get("ema") or ckpt["model"]).to(device).float() # FP32 model
[/usr/local/lib/python3.10/dist-packages/ultralytics/nn/tasks.py](https://localhost:8080/#) in torch_safe_load(weight)
779 },
780 ):
--> 781 ckpt = torch.load(file, map_location="cpu")
782
783 except ModuleNotFoundError as e: # e.name is missing module name
[/usr/local/lib/python3.10/dist-packages/ultralytics/utils/patches.py](https://localhost:8080/#) in torch_load(*args, **kwargs)
84 kwargs["weights_only"] = False
85
---> 86 return _torch_load(*args, **kwargs)
87
88
[/usr/local/lib/python3.10/dist-packages/torch/serialization.py](https://localhost:8080/#) in load(f, map_location, pickle_module, weights_only, mmap, **pickle_load_args)
1038 except RuntimeError as e:
1039 raise pickle.UnpicklingError(UNSAFE_MESSAGE + str(e)) from None
-> 1040 return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
1041
1042
[/usr/local/lib/python3.10/dist-packages/torch/serialization.py](https://localhost:8080/#) in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
1260 "functionality.")
1261
-> 1262 magic_number = pickle_module.load(f, **pickle_load_args)
1263 if magic_number != MAGIC_NUMBER:
1264 raise RuntimeError("Invalid magic number; corrupt file?")
UnpicklingError: invalid load key, '<'.
Environment
Ultralytics HUB Version v0.1.46 Client User Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Operating System Win32 Browser Window Size 2352 x 1352 Server Timestamp 1722690165
Minimal Reproducible Example
No response
Additional
Hi there,
Thank you for reaching out and providing detailed information about the issue you're facing.
It looks like you're encountering an UnpicklingError when trying to load your model after training. This error typically indicates that the file you're trying to load is corrupted or not in the expected format.
Here are a few steps you can take to troubleshoot and resolve this issue:
-
Verify Model File Integrity: Ensure that the model file (
epoch-291.pt) is not corrupted. You can try downloading the file again from the Ultralytics HUB to see if the issue persists. -
Update Packages: Make sure you are using the latest versions of the Ultralytics and PyTorch packages. You can update them using the following commands:
pip install --upgrade ultralytics pip install --upgrade torch -
Re-run Training: Sometimes, re-running the training process can help resolve issues with corrupted files. Ensure that you have a stable internet connection during the training process to avoid any interruptions.
-
Check File Path: Ensure that the file path provided is correct and that the file exists at the specified location.
-
Use Local File: If the file is available locally, you can try loading it directly from your local system instead of using the URL:
model = YOLO('weights/epoch-291.pt')
If the issue persists after trying these steps, please provide additional details such as any error messages or logs you encounter. This will help us further diagnose the problem.
For more detailed guidance, you can refer to our Ultralytics HUB Quickstart Guide.
Feel free to reach out if you have any more questions or need further assistance. We're here to help! 😊
@rolurq It looks like you have a checkpoint for epoch 291. Can you try resuming training?
@sergiuwaxmann As I mentioned in the post, when I try to resume training it throws an exception, the exception is also in the post.
👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.
For additional resources and information, please see the links below:
- Docs: https://docs.ultralytics.com
- HUB: https://hub.ultralytics.com
- Community: https://community.ultralytics.com
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLO 🚀 and Vision AI ⭐