ultralytics
ultralytics copied to clipboard
Updated data/utils.py and engine/validator.py
PR for issue #9095.
Reviewers - @glenn-jocher @Burhan-Q
It's a draft PR but I will happy to alter it and make changes. Please review it and provide suggestions to improve. Thanks
π οΈ PR Summary
Made with β€οΈ by Ultralytics Actions
π Summary
Enhanced dataset handling now supports JSON in addition to YAML.
π Key Changes
- Added an
extension
parameter tocheck_det_dataset
function to support JSON files. - Enhanced the dataset validation logic to accept both
.yaml
/.yml
and.json
formats for dataset descriptions.
π― Purpose & Impact
- Flexibility: Allows users to specify their dataset descriptors in JSON format, providing more flexibility in how datasets are defined and shared. π
- Ease of Use: Makes it easier for those already using JSON files in their projects to integrate with Ultralytics tools without needing to convert their files to YAML. π
- Future Proofing: By supporting more formats, Ultralytics makes its tools more versatile and ready for future developments and community needs. π‘
I have read the CLA Document and I sign the CLA
Codecov Report
Attention: Patch coverage is 51.02041%
with 24 lines
in your changes are missing coverage. Please review.
Project coverage is 74.80%. Comparing base (
28cb2f2
) to head (a0ba44d
).
Additional details and impacted files
@@ Coverage Diff @@
## main #10206 +/- ##
==========================================
- Coverage 77.88% 74.80% -3.08%
==========================================
Files 122 122
Lines 15579 15607 +28
==========================================
- Hits 12133 11675 -458
- Misses 3446 3932 +486
Flag | Coverage Ξ | |
---|---|---|
Benchmarks | 35.57% <20.40%> (-0.09%) |
:arrow_down: |
GPU | 37.36% <26.53%> (-0.03%) |
:arrow_down: |
Tests | 70.16% <51.02%> (-3.52%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hi @glenn-jocher, I will look into the errors and try to resolve them. However, I need to ask a few things.
- You said that I have to test on multiple json datasets, soo how can I look for these datasets. Can you provide some names pls?
- How to check this validation loader. I mean any sample script that I can run and it will load this validation loader and help me run the json datasets?
- Any command that i can run on my local machine to view these CI errors? Like make style or smth like that?
Thanks
Also @glenn-jocher @Burhan-Q, pls review the PR and share your views about if it's correct or what changes can be made. Thanks
Hi @glenn-jocher. My approach is that I check for the extension of the dataset and pass it in the check_det_dataset
. In the implementation of the dataset, I check if the extension is .yaml
or .yml
then it loads data from some other file and if it is .json
then load it using the json loading function.
Is this approach correct? Should I proceed with changing everywhere where check_det_dataset
is called?
Thanks
Hi there! Your approach sounds solid for expanding dataset format support. π Yes, you should proceed with modifying the necessary parts of the code that call check_det_dataset
to accommodate the new logic for handling JSON dataset files. If you have any specific areas where you're unsure, feel free to share more details or a code snippet, and I'd be happy to take a closer look. Keep up the great work!
Hi @glenn-jocher, please review the PR and let me know. Thanks
Hi @glenn-jocher, can you please help with how to resolve this error? I am confused with this. Thanks
Hi there! Sure, Iβd be happy to help. Could you please provide a bit more detail about the error youβre encountering? A snippet of the error message or the context around when it occurs would be really helpful for diagnosing the issue. Looking forward to assisting you! π
Hi, so I am encountering issues at 2 places.
-
file1 In this, I was encountering an issue which I fixed by passing the extension of data
d
incheck_det_dataset
function. Please tell me if this was correct? -
ultralytics\ultralytics\data\utils.py
if self.task == "classify":
unzip_dir = unzip_file(path)
data = check_cls_dataset(unzip_dir)
data["path"] = unzip_dir
else: # detect, segment, pose
_, data_dir, yaml_path = self._unzip(Path(path))
try:
# Load YAML with checks
data = yaml_load(yaml_path)
data["path"] = "" # strip path since YAML should be in dataset root for all HUB datasets
yaml_save(yaml_path, data)
data = check_det_dataset(yaml_path, ".yaml", autodownload) # dict
data["path"] = data_dir # YAML path should be set to '' (relative) or parent (absolute)
except Exception as e:
raise Exception("error/HUB/dataset_stats/init") from e
In this I am encountering the issue on line data = check_det_dataset(yaml_path, ".yaml", autodownload)
where the error says
UnboundLocalError: cannot access local variable 'data' where it is not associated with a value
which means that data variable doesn't contain any value and is referenced before assignment.
Thanks
Also, please let me know how can I check for CI errors on my vscode only and only push code which is clean and error free. Any helpful commands for that?
Hi! For the changes youβve made:
-
Yes, passing the extension to
check_det_dataset
sounds like a good approach to support different file formats. It keeps your code adaptable for both.yaml
and.json
. -
It looks like the issue might stem from the scope where
data
is defined. Make suredata
is defined before yourtry
block or initialized properly within every block that might use it. If the problem persists, a snippet of how you're handlingdata
would be very helpful!
For checking CI errors locally, you can rely on pre-commit hooks or run specific linting and testing commands based on the CI pipeline of the project. For instance, running flake8
for Python linting or pytest
for tests. You might want to look into the project's CI configuration (e.g., .github/workflows/ci.yml
) to see which commands are run. Also, setting up a Docker environment mirroring the CI environment can help catch errors early. π
At minimum, you should run pytests
to verify that non of the tests fail.
Hope this helps!
Hi @glenn-jocher, Can you please check and merge?
@Bhavay-2001 hi there! π Thanks for the heads up. I'll review the changes ASAP and get back to you with any feedback or proceed with merging if everything looks good. Appreciate your patience and contribution! π
Hi @glenn-jocher, any updates on the PR?
Hi @Bhavay-2001! Thanks for checking in. I'm currently reviewing the PR and will provide feedback or approve it shortly. Hang tight! π If thereβs anything specific youβd like to discuss or need help with in the meantime, feel free to let me know!
Hi @glenn-jocher, any updates on PR?
Hi there! π We're currently reviewing the PR and will provide feedback or move forward with merging very soon. Thanks for your patience! If there's anything else you'd like to discuss in the meantime, feel free to reach out. π
Hi @glenn-jocher, any updates?
@Bhavay-2001 hi there! π We're actively reviewing the PR and will keep you updated. Thanks for your patience! If thereβs anything specific you need help with, feel free to let me know. π