validate icon indicating copy to clipboard operation
validate copied to clipboard

Slow performance with all content and product validation turned off

Open jordanpadams opened this issue 6 months ago • 0 comments

Checked for duplicates

Yes - I've already checked

🐛 Describe the bug

When I ran validate on a 2M product bundle with everything disabled expected referential integrity checks, it still takes 2 weeks.

Products: 2013873 Completed in 13.5 days / 324.6 hours

🕵️ Expected behavior

I expected the run to take less time.

📜 To Reproduce

  1. get a large bundle.

Here's the command:

/path/validate-3.5.1/bin/validate -target
/path/msam2/annex_ehlmann_caltech_msl_msam2 --report-file bundle.valrpt \
               -R pds4.bundle \
               --skip-content-validation --skip-product-validation

🖥 Environment Info

Linux OS m7l.2xlarge EC2 instance

📚 Version of Software Used

v3.5.1

🩺 Test Data / Additional context

No response

🦄 Related requirements

Tightly coupled with #931

⚙️ Engineering Details

Per initial analysis by @al-niessner, code block here:

  1. outer loop running over all targets
  2. inner loop running over all targets again but different list a. uses if to see if var from (1) is equal to (2) b. cannot look up using hash because misuse of overly generic Collection<> instead of Set<> or Map<> or something better suited to task

This part of the job will take awhile and I have no way to estimate it for you. It does not seem to be stuck though.

🎉 Integration & Test

No response

jordanpadams avatar Aug 08 '24 19:08 jordanpadams