validate icon indicating copy to clipboard operation
validate copied to clipboard

Validate appears to not perform integrity checks for bundles/pass incorrect <Internal_Reference>s for collections

Open mace-space opened this issue 7 months ago • 6 comments

Checked for duplicates

Yes - I've already checked

🐛 Describe the bug

Related to #432 (was asked to open another ticket)

I ran validate using --rule pds4.bundle but no referential checks were performed (even though with that option it should check references):

 Summary:

   31739 product(s)
   100000 error(s)
   42256 warning(s)

   Product Validation Summary:
     30664      product(s) passed
     1075       product(s) failed
     0          product(s) skipped
     31739      product(s) total

   Referential Integrity Check Summary:
     0          check(s) passed
     0          check(s) failed
     0          check(s) skipped
     0          check(s) total

(Note the max error threshold has been exceeded).

I also tried running it on the specific collection where I had spotted LID errors:

% validate --rule pds4.collection --report-file rav1ciun_validate_browse_collection.log --verbose 2 --target ./wenkert_pdart16_vgr_rav1ciun/browse

Here's an example browse label from that collection:
1 <?xml version="1.0" encoding="UTF-8" standalone="no"?> 2 3 <?xml-model href="https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1G00.sch" 4 schematypens="http://purl.oclc.org/dsdl/schematron"?> 5 6 <Product_Browse xmlns="http://pds.nasa.gov/pds4/pds/v1" 7 xmlns:pds="http://pds.nasa.gov/pds4/pds/v1" 8 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 9 xsi:schemaLocation="http://pds.nasa.gov/pds4/pds/v1 https://pds.nasa.gov/pds4/pds/v1/PDS4_PDS_1G00.xsd"> 10 <Identification_Area> 11 <logical_identifier>urn:nasa:pds:wenkert_pdart16_vgr_rav1ciun:browse_qedr:vgr_1201-mamqtv-001010-data-001010.001.png</logical_identifier> 12 <version_id>1.0</version_id> 13 <title>RAV1CIUN DATA Browse Product - vgr_1201-mamqtv-001010-data-001010.001.png</title> 14 <information_model_version>1.16.0.0</information_model_version> 15 <product_class>Product_Browse</product_class> 16 </Identification_Area> 17 <Reference_List> 18 <Internal_Reference> 19 <lid_reference>urn:nasa:pds:wenkert_pdart16_vgr_rav1ciun:browse_qedr:vgr_1201-mamqtv-001010-data-001010.001</lid_reference> 20 <reference_type>browse_to_data</reference_type> 21 <comment>This is a reference to the full resolution data file corresponding to this browse image.</comment> 22 </Internal_Reference> 23 </Reference_List> 24 <File_Area_Browse> 25 <File> 26 <file_name>VGR_1201-MAMQTV-001010-DATA-001010.001.png</file_name> 27 <local_identifier>BROWSE_FILE</local_identifier> 28 <creation_date_time>2023-08-18</creation_date_time> 29 </File> 30 <Encoded_Image> 31 <local_identifier>BROWSE_IMAGE</local_identifier> 32 <offset unit="byte">0</offset> 33 <encoding_standard_id>PNG</encoding_standard_id> 34 </Encoded_Image> 35 </File_Area_Browse> 36 </Product_Browse>

Line 19 points to an incorrect LID, but Validate does not report any of these:

  Referential Integrity Check Summary:
     30582      check(s) passed
     1          check(s) failed
     0          check(s) skipped
     30583      check(s) total

It passed all of the browse labels (the one fail refers to a .DS_Store file).

So, unlike the -R pds4.bundle option, with the -R pds4.collection it does report referential integrity checks. However, it is not catching incorrect LIDs.

The LID urn:nasa:pds:wenkert_pdart16_vgr_rav1ciun:browse_qedr:vgr_1201-mamqtv-001010-data-001010.001 does not exist (the browse LIDs have .png suffixes), although it shouldn't even be self-referencing the browse_qedr collection but rather the data_qedr collection.

🕵️ Expected behavior

Validate flag an error for non-existing LIDs

📜 To Reproduce

  1. % validate --rule pds4.bundle --report-file rav1ciun_validate_v3.5.1.log --verbose 2 --target ./wenkert_pdart16_vgr_rav1ciun
  2. % validate --rule pds4.collection --report-file rav1ciun_browse_validate_v3.5.1.log --verbose 2 --target ./wenkert_pdart16_vgr_rav1ciun/browse

🖥 Environment Info

  • Validate v3.5.1
  • MacOS 10.15.7
  • Java 11.0.15:
% java --version
openjdk 11.0.15 2022-04-19
OpenJDK Runtime Environment Temurin-11.0.15+10 (build 11.0.15+10)
OpenJDK 64-Bit Server VM Temurin-11.0.15+10 (build 11.0.15+10, mixed mode)```

📚 Version of Software Used

Validate v3.5.1

🩺 Test Data / Additional context

Bundle tar.gz too large to attach here, shall I share via Dropbox or would you need just a sample?

Bundle validate log rav1ciun_validate_v3.5.1_browse_collection.log

🦄 Related requirements

No response

⚙️ Engineering Details

No response

🎉 Integration & Test

No response

mace-space avatar Jul 11 '24 13:07 mace-space