ultralytics Updated data/utils.py and engine/validator.py

Updated data/utils.py and engine/validator.py

Open Bhavay-2001 opened this issue 2 months ago • 20 comments

PR for issue #9095.

Reviewers - @glenn-jocher @Burhan-Q

It's a draft PR but I will happy to alter it and make changes. Please review it and provide suggestions to improve. Thanks

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Enhanced dataset handling now supports JSON in addition to YAML.

📊 Key Changes

Added an extension parameter to check_det_dataset function to support JSON files.
Enhanced the dataset validation logic to accept both .yaml/.yml and .json formats for dataset descriptions.

🎯 Purpose & Impact

Flexibility: Allows users to specify their dataset descriptors in JSON format, providing more flexibility in how datasets are defined and shared. 🔄
Ease of Use: Makes it easier for those already using JSON files in their projects to integrate with Ultralytics tools without needing to convert their files to YAML. 🌐
Future Proofing: By supporting more formats, Ultralytics makes its tools more versatile and ready for future developments and community needs. 💡

Apr 21 '24 17:04 Bhavay-2001

I have read the CLA Document and I sign the CLA

Apr 21 '24 17:04 Bhavay-2001

Codecov Report

Attention: Patch coverage is 51.02041% with 24 lines in your changes are missing coverage. Please review.

Project coverage is 74.80%. Comparing base (28cb2f2) to head (a0ba44d).

Files	Patch %	Lines
ultralytics/data/utils.py	46.15%	14 Missing :warning:
ultralytics/utils/__init__.py	22.22%	7 Missing :warning:
ultralytics/engine/exporter.py	0.00%	2 Missing :warning:
ultralytics/data/explorer/explorer.py	0.00%	1 Missing :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #10206      +/-   ##
==========================================
- Coverage   77.88%   74.80%   -3.08%     
==========================================
  Files         122      122              
  Lines       15579    15607      +28     
==========================================
- Hits        12133    11675     -458     
- Misses       3446     3932     +486

Flag	Coverage Δ
Benchmarks	`35.57% <20.40%> (-0.09%)`	:arrow_down:
GPU	`37.36% <26.53%> (-0.03%)`	:arrow_down:
Tests	`70.16% <51.02%> (-3.52%)`	:arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Apr 21 '24 17:04 codecov[bot]

Hi @glenn-jocher, I will look into the errors and try to resolve them. However, I need to ask a few things.

You said that I have to test on multiple json datasets, soo how can I look for these datasets. Can you provide some names pls?
How to check this validation loader. I mean any sample script that I can run and it will load this validation loader and help me run the json datasets?
Any command that i can run on my local machine to view these CI errors? Like make style or smth like that?

Thanks

Apr 24 '24 12:04 Bhavay-2001

Also @glenn-jocher @Burhan-Q, pls review the PR and share your views about if it's correct or what changes can be made. Thanks

Apr 24 '24 12:04 Bhavay-2001

Hi @glenn-jocher. My approach is that I check for the extension of the dataset and pass it in the check_det_dataset. In the implementation of the dataset, I check if the extension is .yaml or .yml then it loads data from some other file and if it is .json then load it using the json loading function.

Is this approach correct? Should I proceed with changing everywhere where check_det_dataset is called? Thanks

Apr 24 '24 16:04 Bhavay-2001

Hi there! Your approach sounds solid for expanding dataset format support. 👍 Yes, you should proceed with modifying the necessary parts of the code that call check_det_dataset to accommodate the new logic for handling JSON dataset files. If you have any specific areas where you're unsure, feel free to share more details or a code snippet, and I'd be happy to take a closer look. Keep up the great work!

Apr 24 '24 16:04 glenn-jocher

Hi @glenn-jocher, please review the PR and let me know. Thanks

Apr 26 '24 09:04 Bhavay-2001

Hi @glenn-jocher, can you please help with how to resolve this error? I am confused with this. Thanks

Apr 26 '24 11:04 Bhavay-2001

Hi there! Sure, I’d be happy to help. Could you please provide a bit more detail about the error you’re encountering? A snippet of the error message or the context around when it occurs would be really helpful for diagnosing the issue. Looking forward to assisting you! 😊

Apr 26 '24 21:04 glenn-jocher

Hi, so I am encountering issues at 2 places.

file1 In this, I was encountering an issue which I fixed by passing the extension of data d in check_det_dataset function. Please tell me if this was correct?
ultralytics\ultralytics\data\utils.py

        if self.task == "classify":
            unzip_dir = unzip_file(path)
            data = check_cls_dataset(unzip_dir)
            data["path"] = unzip_dir
        else:  # detect, segment, pose
            _, data_dir, yaml_path = self._unzip(Path(path))
            try:
                # Load YAML with checks
                data = yaml_load(yaml_path)
                data["path"] = ""  # strip path since YAML should be in dataset root for all HUB datasets
                yaml_save(yaml_path, data)
                data = check_det_dataset(yaml_path, ".yaml", autodownload)  # dict
                data["path"] = data_dir  # YAML path should be set to '' (relative) or parent (absolute)
            except Exception as e:
                raise Exception("error/HUB/dataset_stats/init") from e

In this I am encountering the issue on line data = check_det_dataset(yaml_path, ".yaml", autodownload) where the error says UnboundLocalError: cannot access local variable 'data' where it is not associated with a value which means that data variable doesn't contain any value and is referenced before assignment.

Thanks

Apr 27 '24 07:04 Bhavay-2001

Also, please let me know how can I check for CI errors on my vscode only and only push code which is clean and error free. Any helpful commands for that?

Apr 27 '24 07:04 Bhavay-2001

Hi! For the changes you’ve made:

Yes, passing the extension to check_det_dataset sounds like a good approach to support different file formats. It keeps your code adaptable for both .yaml and .json.
It looks like the issue might stem from the scope where data is defined. Make sure data is defined before your try block or initialized properly within every block that might use it. If the problem persists, a snippet of how you're handling data would be very helpful!

For checking CI errors locally, you can rely on pre-commit hooks or run specific linting and testing commands based on the CI pipeline of the project. For instance, running flake8 for Python linting or pytest for tests. You might want to look into the project's CI configuration (e.g., .github/workflows/ci.yml) to see which commands are run. Also, setting up a Docker environment mirroring the CI environment can help catch errors early. 😊

At minimum, you should run pytests to verify that non of the tests fail.

Hope this helps!

Apr 27 '24 17:04 glenn-jocher

Hi @glenn-jocher, Can you please check and merge?

May 02 '24 16:05 Bhavay-2001

@Bhavay-2001 hi there! 👋 Thanks for the heads up. I'll review the changes ASAP and get back to you with any feedback or proceed with merging if everything looks good. Appreciate your patience and contribution! 😊

May 03 '24 00:05 glenn-jocher

Hi @glenn-jocher, any updates on the PR?

May 05 '24 04:05 Bhavay-2001

Hi @Bhavay-2001! Thanks for checking in. I'm currently reviewing the PR and will provide feedback or approve it shortly. Hang tight! 😊 If there’s anything specific you’d like to discuss or need help with in the meantime, feel free to let me know!

May 05 '24 21:05 glenn-jocher

Hi @glenn-jocher, any updates on PR?

May 09 '24 11:05 Bhavay-2001

Hi there! 👋 We're currently reviewing the PR and will provide feedback or move forward with merging very soon. Thanks for your patience! If there's anything else you'd like to discuss in the meantime, feel free to reach out. 😊

May 09 '24 23:05 glenn-jocher

Hi @glenn-jocher, any updates?

May 13 '24 13:05 Bhavay-2001

@Bhavay-2001 hi there! 👋 We're actively reviewing the PR and will keep you updated. Thanks for your patience! If there’s anything specific you need help with, feel free to let me know. 😊

May 13 '24 20:05 glenn-jocher

ultralytics ultralytics copied to clipboard

Updated data/utils.py and engine/validator.py

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

Codecov Report

ultralytics
ultralytics copied to clipboard