validate icon indicating copy to clipboard operation
validate copied to clipboard

Referential integrity check takes much longer than it seems it should

Open rgdeen opened this issue 8 months ago • 13 comments

Checked for duplicates

No - I haven't checked

🐛 Describe the bug

"bug" Is a strong word, but it's the closest category.

I have a bundle (MSAM2) with ~2 million products in it. I can do product-level validation in parallel using KDP or Nucleus or other technology to farm it out to a bunch of nodes. However, referential integrity (verifying the inventory files are correct and match the files present) has to be done on the bundle as a whole - I'm not aware of any way to split that up. (maybe by collection, but there are only 2 relevant collections here so that doesn't help much).

In order to do this, I'm running with product and content validation turned off. But it is still taking an inordinate amount of time. As of this writing, it's been running 6 days and per the log has gotten through 1,047,476 out of 2,013,873 products - about halfway. That's a rate of about 2 per second. Seems like it should be able to do better in this case.

🕵️ Expected behavior

Well I expected what I got ;-) but I would hope the RI checks could be faster.

📜 To Reproduce

Here's the command line:

/path/to/msam2/validate-3.5.1/bin/validate -target /path/to/msam2/annex_ehlmann_caltech_msl_msam2 --report-file bundle.valrpt -R pds4.bundle --skip-content-validation --skip-product-validation

🖥 Environment Info

$ uname -a
Linux machine-name 3.10.0-1160.76.1.el7.x86_64 #1 SMP Tue Jul 26 14:15:37 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

$ java -version
java version "17.0.11" 2024-04-16 LTS
Java(TM) SE Runtime Environment (build 17.0.11+7-LTS-207)
Java HotSpot(TM) 64-Bit Server VM (build 17.0.11+7-LTS-207, mixed mode, sharing)

📚 Version of Software Used

$ /mnt/pdsdata/scratch/rgd/msam2/validate-3.5.1/bin/validate -version

gov.nasa.pds:validate
Version 3.5.1
Release Date: 2024-05-25 17:45:47

Copyright 2019, by the California Institute of Technology ("Caltech").
All rights reserved.

🩺 Test Data / Additional context

No response

🦄 Related requirements

No response

⚙️ Engineering Details

No response

🎉 Integration & Test

No response

rgdeen avatar Jun 11 '24 16:06 rgdeen