pyld icon indicating copy to clipboard operation
pyld copied to clipboard

Guidance on how to validate all parameters are defined in context

Open C-Loftus opened this issue 4 months ago • 2 comments

Thank you very much for your work on this library. I am looking to write a simple script to make sure a some jsonld data from my API is exhaustively mapped (i.e. every term is mapped to one in the context) and was wondering if there is a way to do this with pyld.

My understanding of jsonld is that I can define whatever properties I want and even if they are not defined in the context, they will still be expanded / compacted etc

{
  "@context": "http://schema.org/",
  "@type": "Person",
  "name": "Jane Doe",
  "jobTitle": "Professor",
  "telephone": "(425) 123-4567",
  "dummy_name": "foo_bar"
}

would be canonized to

_:c14n0 <http://schema.org/dummy_name> "foo_bar" .
_:c14n0 <http://schema.org/jobTitle> "Professor" .
_:c14n0 <http://schema.org/name> "Jane Doe" .
_:c14n0 <http://schema.org/telephone> "(425) 123-4567" .
_:c14n0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person> .

However, I was hoping that in the jsonld processor, there might be a way to check that the vocab is invalid during the process of fetching the schema.org vocab. I think this would be valuable since to my understanding, there aren't any jsonld linters that can automatically run these sorts of checks in a ci/cd pipeline, ensuring rdf data is valid without needing to write redundant shacl shapes.

Thank you very much Regards Colton

C-Loftus avatar Aug 07 '25 19:08 C-Loftus

The JavaScript JSON-LD processor, jsonld.js, has a "safe mode" feature for this that hasn't yet been implemented in pyld that would do this in an optimized way. However, in the absence of that feature, you should be able to use the compact() API to get a similar feature. Here's some example JS code that does that with jsonld.js's compact() API:

import jsonld from 'jsonld';

const undefinedTerm = {
  "@context": {
    "defined": "ex:defined"
  },
  "defined": "exists",
  "undefined": "does not exist"
};

const repairedUndefinedTerm = {
  "@context": {
    "defined": "ex:defined",
    "undefined": "ex:undefined"
  },
  "defined": "exists",
  "undefined": "does not exist" 
};

const relativeTypeUrl = {
  "@context": {
    "DefinedType": "ex:DefinedType"
  },
  "@type": ["DefinedType", "UndefinedType"]
};

const repairedRelativeTypeUrl = {
  "@context": {
    "DefinedType": "ex:DefinedType",
    "UndefinedType": "ex:UndefinedType"
  },
  "@type": ["DefinedType", "UndefinedType"]
};

// test undefined term
testSafeMode(
  'undefined term safe mode',
  undefinedTerm, repairedUndefinedTerm);

// test relative URL
testSafeMode(
  'relative URL safe mode',
  relativeTypeUrl, repairedRelativeTypeUrl);

async function testSafeMode(testName, brokenDoc, repairedDoc) {
  const expectFalse = await safeMode(brokenDoc);
  const expectTrue = await safeMode(repairedDoc);
  console.log(`${testName} pass: `,
    expectFalse === false && expectTrue === true);
}  

async function safeMode(doc) {
  const compacted = await jsonld.compact(
    doc, doc['@context'], {base: 'invalid:'});
  return JSON.stringify(doc) === JSON.stringify(compacted);
}

I can't remember my python syntax off hand, so just taking a quick guess, the equivalent use of pyld's compact() API would be something like:

def safeMode(doc):
  compacted = compact(doc, doc['@context'], {'base': 'invalid:'})
  return json.dumps(doc) == json.dumps(compacted)

There might be something more efficient as well.

dlongley avatar Aug 07 '25 20:08 dlongley

Thank you very much! I will take a look at this and report back!

C-Loftus avatar Aug 07 '25 21:08 C-Loftus