hub icon indicating copy to clipboard operation
hub copied to clipboard

cp: cannot stat 'runs/detect/train/weights/last.pt': No such file or directory

Open Ray150789 opened this issue 1 year ago β€’ 4 comments

Search before asking

  • [X] I have searched the HUB issues and found no similar bug report.

HUB Component

Training

Bug

My training completed but I cant find the best,pt file it says results saved to runs/detect/train my folders include runs/detect/predict which has output video files which I wish to retrain as the output has low quality output. Also can I use an ipynb again if I want to get a better trained model.

Environment

Device name DESKTOP-AM4RDAQ Processor
Installed RAM 8,00 GB (7,88 GB usable) Device ID 2C265D9F-7E3E-4D23-816A-85FFE62C5616 Product ID 00331-20020-00000-AA388 System type 64-bit operating system, x64-based processor Pen and touch No pen or touch input is available for this display Edition Windows 10 Pro Version 22H2 Installed on β€Ž2023/β€Ž10/β€Ž09 OS build 19045.5247 Experience Windows Feature Experience Pack 1000.19060.1000.0

Minimal Reproducible Example

No response

Additional

No response

Ray150789 avatar Dec 21 '24 09:12 Ray150789

πŸ‘‹ Hello @Ray150789, thank you for raising an issue about Ultralytics HUB πŸš€! We're here to help.

It looks like your training process has completed, but you're unable to locate the expected best.pt or last.pt files, and you've also mentioned a desire to retrain for better results. Since you've flagged this as a πŸ› Bug Report, here's how you can help us assist you faster:

  • Please provide a Minimum Reproducible Example (MRE). This includes concise steps or code snippets to help us understand and reproduce the issue on our end.
  • Screenshots of your directory structure (runs/detect/) or specific error messages would also be incredibly helpful.

For your retraining query, you should be able to use an .ipynb notebook again for further training. Ensure that your dataset and environment are correctly set up to optimize your model's performance.

An Ultralytics engineer will review your issue soon to provide more direct assistance. Thank you for your patience and for helping us improve Ultralytics HUB! 😊

UltralyticsAssistant avatar Dec 21 '24 09:12 UltralyticsAssistant

@Ray150789 thank you for raising this issue! Let me help clarify and guide you through potential resolutions.

The error cp: cannot stat 'runs/detect/train/weights/last.pt': No such file or directory usually indicates that the training process did not generate a last.pt file. This could happen for several reasons, such as improper termination of training or issues during the saving process.

Steps to Troubleshoot and Resolve:

  1. Verify Training Completion:

    • Double-check the training logs to confirm that the training process completed successfully. If there were any interruptions or errors, the weights (last.pt and best.pt) might not have been saved.
  2. Confirm Output Directory:

    • According to your description, results saved to runs/detect/train suggests that the model outputs, including weights, should be in that directory. Check the exact path:
      runs/detect/train/weights/
      
      If the folder exists but is empty, it indicates an issue with saving the weights.
  3. Output Videos in runs/detect/predict:

    • The runs/detect/predict folder contains results from inference runs, not training. If you wish to retrain using those predictions, you'd need to prepare a dataset from the output. This involves annotating the images or videos again with corrected labels if required.
  4. Retraining with Updated Parameters:

    • Yes, you can use the .ipynb notebook again to retrain the model. Ensure that:
      • You specify a valid dataset path in the data argument.
      • Update training hyperparameters like epochs, lr0, or batch_size to potentially improve the model's quality.
    • Example:
      from ultralytics import YOLO
      model = YOLO("yolov8n.pt")  # Load a base model or your checkpoint
      model.train(data="path/to/dataset.yaml", epochs=50, batch_size=16)
      
  5. Ensure Latest Version:

    • Confirm that you are using the latest version of Ultralytics packages. Run the following:
      pip install ultralytics --upgrade
      
  6. Check Disk Space:

    • Insufficient disk space might prevent saving weights. Verify that you have enough storage available on your machine.

Additional Notes:

  • If you encounter low-quality predictions, you may want to:
    • Ensure your dataset is properly annotated and balanced.
    • Train for more epochs or fine-tune the learning rate.
    • Use a larger YOLO model (e.g., yolov8m.pt instead of yolov8n.pt) for better accuracy.
  • For running training on a lower-spec machine like yours, consider reducing batch_size and enabling device='cpu' if GPU is unavailable.

If the issue persists, feel free to share additional details, such as training logs or dataset configuration, so we can assist further. Best of luck with your training, and I hope you achieve the desired results! πŸš€

pderrenger avatar Dec 22 '24 01:12 pderrenger

Thank you for the reply

I think that providing the facilities to train models is awesome My PC specs: Processor Installed RAM 8,00 GB (7,88 GB usable) System type 64-bit operating system, x64-based processor Edition Windows 10 Pro Version 22H2 Experience Windows Feature Experience Pack 1000.19060.1000.0

My ipynb file: %pip install ultralytics %pip install -U roboflow

Get dataset

from roboflow import Roboflow rf = Roboflow(api_key="BVsY1jgpRSprqMqvAQ5v") project = rf.workspace("roboflow-jvuqo").project( "football-players-detection-3zvbc") version = project.version(1) dataset = version.download("yolov5") dataset.location 'c:\Users\User\Football_project\football_analysis\training\football-players-detection-1' import shutil

shutil.move('football-players-detection-1/test', 'football-players-detection-1/football-players-detection-1/test' )

shutil.move('football-players-detection-1/train',

'football-players-detection-1/football-players-detection-1/train')

shutil.move('football-players-detection-1/valid',

'football-players-detection-1/football-players-detection-1/valid') 'football-players-detection-1/football-players-detection-1/valid'

Training

!yolo task=detect mode=train model=yolov5s.pt data={dataset.location}/data.yaml epochs=100 imgsz=640 The last successfully completed training model ran with following:

!yolo task=detect mode=train model=yolov5s.pt data={dataset.location}/data.yaml epochs=15 imgsz=640

Results saved to runs/detect/train

from google.colab import drive drive.mount('/content/drive') from google.colab import drive drive.mount('/content/drive')

!cd / !cp -R runs/detect/train/weights/last.pt drive/MyDrive/Colab\ Notebooks/ !cd / !cp -R runs/detect/train/weights/best.pt drive/MyDrive/Colab\ Notebooks/ When I attempt to copy the files to the directory to make it work, it outputs the error cp: cannot stat 'runs/detect/train/weights/last.pt': No such file or directory cp: cannot stat 'runs/detect/train/weights/best.pt': No such file or directory

I initially used model=yolov5x but I did not get a successful output, I attempted to lower the number of epochs to no avail. The current code I'm working on uses a model=yolov5n epochs=50 but the output video does not track the ball successfully enough. Therefore I eventually attempted to use model=yolov5s epochs=15 which ran successfully but I cant extract the data. 50 epochs and 30 epochs was too large to run until completion. My data.yaml file: path: C:\Users\User\Football_project\football_analysis\training\football-players-detection-1\data.yaml

names:

  • ball
  • goalkeeper
  • player
  • referee nc: 4 roboflow: license: CC BY 4.0 project: football-players-detection-3zvbc url: https://universe.roboflow.com/roboflow-jvuqo/football-players-detection-3zvbc/dataset/1 version: 1 workspace: roboflow-jvuqo test: ../test/images train: football-players-detection-1/train/images val: football-players-detection-1/valid/images

My yolo_inference file: from ultralytics import YOLO

model = YOLO('models/best.pt')

results = model.predict('input_videos/08fd33_4.mp4', save=True) print(results[0]) print('==================================') for box in results[0].boxes: print(box) How do I annotate the images or videos again with corrected labels if required.

I assume there is a problem saving the weights file.

I got 2 separate predict files in runs/detect/predict and runs/detect/predict2 folder, they contain output videos, the first one detects the ball for approx 5% of the video and detects referees as players. The second output video detects the referees separate from players but does not track the ball. I attached the videos as google drive links. 08fd33_4.avi https://drive.google.com/file/d/111mzZ1bNEDC2XCS6EFLboRscQgFz1k7Z/view?usp=drive_web 08fd33_4.avi https://drive.google.com/file/d/1QOmdaP3Vs9QsuKtR88wBXJa1C9n-iLvq/view?usp=drive_web Regards Paula Hopefully I provided enough insight into the issue.

On Sun, Dec 22, 2024 at 3:49β€―AM Paula Derrenger @.***> wrote:

@Ray150789 https://github.com/Ray150789 thank you for raising this issue! Let me help clarify and guide you through potential resolutions.

The error cp: cannot stat 'runs/detect/train/weights/last.pt': No such file or directory usually indicates that the training process did not generate a last.pt file. This could happen for several reasons, such as improper termination of training or issues during the saving process. Steps to Troubleshoot and Resolve:

Verify Training Completion:

  • Double-check the training logs to confirm that the training process completed successfully. If there were any interruptions or errors, the weights (last.pt and best.pt) might not have been saved.

Confirm Output Directory:

  • According to your description, results saved to runs/detect/train suggests that the model outputs, including weights, should be in that directory. Check the exact path:

    runs/detect/train/weights/

    If the folder exists but is empty, it indicates an issue with saving the weights.

Output Videos in runs/detect/predict:

  • The runs/detect/predict folder contains results from inference runs, not training. If you wish to retrain using those predictions, you'd need to prepare a dataset from the output. This involves annotating the images or videos again with corrected labels if required.

Retraining with Updated Parameters:

  • Yes, you can use the .ipynb notebook again to retrain the model. Ensure that: - You specify a valid dataset path in the data argument. - Update training hyperparameters like epochs, lr0, or batch_size to potentially improve the model's quality.

    • Example:

    from ultralytics import YOLOmodel = YOLO("yolov8n.pt") # Load a base model or your checkpointmodel.train(data="path/to/dataset.yaml", epochs=50, batch_size=16)

Ensure Latest Version:

  • Confirm that you are using the latest version of Ultralytics packages. Run the following:

    pip install ultralytics --upgrade

Check Disk Space:

  • Insufficient disk space might prevent saving weights. Verify that you have enough storage available on your machine.

Additional Notes:

  • If you encounter low-quality predictions, you may want to:
    • Ensure your dataset is properly annotated and balanced.
    • Train for more epochs or fine-tune the learning rate.
    • Use a larger YOLO model (e.g., yolov8m.pt instead of yolov8n.pt) for better accuracy.
  • For running training on a lower-spec machine like yours, consider reducing batch_size and enabling device='cpu' if GPU is unavailable.

If the issue persists, feel free to share additional details, such as training logs or dataset configuration, so we can assist further. Best of luck with your training, and I hope you achieve the desired results! πŸš€

β€” Reply to this email directly, view it on GitHub https://github.com/ultralytics/hub/issues/956#issuecomment-2558295818, or unsubscribe https://github.com/notifications/unsubscribe-auth/AILUVNU7FPFPK4ROXFC4CC32GYLDTAVCNFSM6AAAAABUAORKX6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJYGI4TKOBRHA . You are receiving this because you were mentioned.Message ID: @.***>

Ray150789 avatar Jan 13 '25 11:01 Ray150789

@Ray150789 hi Paula,

Thank you for providing such detailed insight into your issue. Based on the information shared, I’ll address your concerns and guide you toward a resolution.


1. Missing last.pt and best.pt Files

The error indicates that the training process did not generate the weights (last.pt or best.pt). This could be due to:

  • Training not completing successfully.
  • Disk space limitations.
  • File path errors during saving.

Actions:

  • Verify Training Completion:

    • Check the training logs for any errors or interruptions. If the training completed, the weights should be saved in runs/detect/train/weights/.
  • Confirm Directory Existence:

    • Double-check the directory runs/detect/train/weights/ for saved weights. If the directory exists but is empty, the training process might have been interrupted, or there might have been a permissions issue.
  • Check Disk Space:

    • Ensure your system has enough free storage to save the weights.

2. Improving Ball Tracking and Predictions

Low-quality predictions, such as poor ball tracking or misclassification of referees as players, may result from insufficient training or suboptimal dataset quality.

Suggestions:

  • Dataset Quality:

    • Make sure the dataset is well-annotated. Since your dataset seems to mislabel referees as players, you may need to refine these annotations.
  • Augment Dataset:

    • Add more diverse data to improve the model's ability to distinguish between classes (e.g., more videos of the ball, referees, players, etc.).
  • Model Selection:

    • Since your system has limited resources, start with a lightweight model like yolov5n.pt or yolov8n.pt. Once satisfied with the results, consider using a slightly larger model like yolov5s.pt or yolov8s.pt for better accuracy.
  • Training Parameters:

    • Increase the number of epochs gradually (e.g., from 15 to 30) if your system permits. Alternatively, reduce the batch size (batch_size=4 or lower) to alleviate memory constraints.
    • Use the following example:
      from ultralytics import YOLO
      model = YOLO("yolov5n.pt")  # or yolov8n.pt
      model.train(data="path/to/data.yaml", epochs=30, batch_size=4, imgsz=640)
      

3. Annotating Videos Again with Corrected Labels

To improve annotations, you can:

  • Use tools like Roboflow or LabelImg to annotate or correct your dataset.
  • For videos, convert frames to images (using tools like FFmpeg), annotate them, and repackage them into a YOLO-compatible dataset structure.

Steps:

  1. Extract frames from the video:

    ffmpeg -i video.mp4 -vf fps=1 output%d.jpg
    

    This will save one frame per second as images (output1.jpg, output2.jpg, ...).

  2. Annotate the extracted frames using a tool like Roboflow.

  3. Update the data.yaml file to point to the newly annotated dataset.

  4. Train the model again with the updated dataset:

    model.train(data="path/to/new_data.yaml", epochs=50, batch_size=4)
    

4. Inference and Results Analysis

From your inference code, you are correctly using the predict method. If the current results are unsatisfactory:

  • Ensure you’re loading the correct model weights (e.g., best.pt).
  • Use the updated dataset for retraining and verify predictions with the refined model.
from ultralytics import YOLO

# Load your best model
model = YOLO("runs/detect/train/weights/best.pt")

# Run inference
results = model.predict("input_videos/08fd33_4.mp4", save=True)
for box in results[0].boxes:
    print(box)

5. Handling Limited Hardware

Your PC's specifications (8GB RAM, no GPU) are limited for high-intensity YOLO training. Consider:

  • Cloud Training: Use Ultralytics HUB's Cloud Training to train models on powerful hardware.
  • Colab: Continue leveraging Google Colab for free GPU support, adjusting batch size and epochs for better performance.

6. Debugging Paths and Errors

Your error cp: cannot stat suggests a path issue. Ensure the following:

  • The directory structure is correct (runs/detect/train/weights/ exists).
  • Use absolute paths rather than relative paths for copying files to avoid errors.

Example for Colab:

!cp /content/runs/detect/train/weights/best.pt /content/drive/MyDrive/Colab_Notebooks/

7. Next Steps

  • Refine and augment your dataset to address labeling inconsistencies.
  • Retrain with adjusted parameters or try a larger model if feasible.
  • Use cloud or GPU-based training for improved efficiency.
  • If you continue encountering issues, please share specific logs or errors for further guidance.

I hope this helps! Don't hesitate to ask if you have any additional questions. Best of luck with your project πŸš€!

pderrenger avatar Jan 13 '25 22:01 pderrenger

πŸ‘‹ Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

  • Docs: https://docs.ultralytics.com
  • HUB: https://hub.ultralytics.com
  • Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO πŸš€ and Vision AI ⭐

github-actions[bot] avatar Nov 23 '25 00:11 github-actions[bot]