validate
validate copied to clipboard
Slow performance with all content and product validation turned off
Checked for duplicates
Yes - I've already checked
🐛 Describe the bug
When I ran validate on a 2M product bundle with everything disabled expected referential integrity checks, it still takes 2 weeks.
Products: 2013873 Completed in 13.5 days / 324.6 hours
🕵️ Expected behavior
I expected the run to take less time.
📜 To Reproduce
-
get a large bundle.
Here's the command:
/path/validate-3.5.1/bin/validate -target
/path/msam2/annex_ehlmann_caltech_msl_msam2 --report-file bundle.valrpt \
-R pds4.bundle \
--skip-content-validation --skip-product-validation
🖥 Environment Info
Linux OS m7l.2xlarge EC2 instance
📚 Version of Software Used
v3.5.1
🩺 Test Data / Additional context
No response
🦄 Related requirements
Tightly coupled with #931
⚙️ Engineering Details
Per initial analysis by @al-niessner, code block here:
- outer loop running over all targets
- inner loop running over all targets again but different list a. uses if to see if var from (1) is equal to (2) b. cannot look up using hash because misuse of overly generic Collection<> instead of Set<> or Map<> or something better suited to task
This part of the job will take awhile and I have no way to estimate it for you. It does not seem to be stuck though.
🎉 Integration & Test
No response