micrometa
micrometa copied to clipboard
JSON-LD parser does only find the first item
Am 20.03.2017 um 13:59 schrieb Claas Kalwa:
Hallo Joschi,
ich habe Probleme beim Extrahieren mehrerer JSON-LD Items mit dem Micrometa V1 Parser. Er erkennt lediglich das erste Item, egal ob die Items mit @graph gruppiert sind oder seperat in eigenen script-Elementen vorkommen.
Im Anhang habe ich ein Beispiel, das eigentlich funktionieren sollte, denke ich.
Hast Du eine Idee, wo das Problem liegen könnte?
Example source:
<!DOCTYPE html>
<html>
<head>
<title>TODO supply a title</title>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@graph": [
{
"name": "Google Inc.",
"@type": "LocalBusiness",
"address": {
"@type": "PostalAddress",
"addressCountry": "United States",
"streetAddress": "1600 Amphitheatre Parkway",
"addressLocality": "Mountain View",
"addressRegion": "CA",
"postOfficeBoxNumber": null,
"postalCode": "94043",
"telephone": "+1 650-253-0000",
"faxNumber": "+1 650-253-0001"
}
},
{
"name": "Google Ann Arbor",
"@type": "LocalBusiness",
"address": {
"@type": "PostalAddress",
"addressCountry": "United States",
"streetAddress": "201 S. Division St. Suite 500",
"addressLocality": "Ann Arbor",
"addressRegion": "MI",
"postOfficeBoxNumber": null,
"postalCode": "48104",
"telephone": "+1 734-332-6500",
"faxNumber": "+1 734-332-6501"
}
}
]
}
</script>
</head>
<body>
<div>TODO write content</div>
</body>
</html>
The commit closing this issue does not entirely fix this issue. The JSON LD implementation still does not find multiple items in case the value of @graph
has more than one root item (read: is an array).
Why? Because \Jkphl\Micrometa\Infrastructure\Parser\JsonLD::parseRootNode
does only return the first found node. This probably is the specific framing implementation the class docbloc mentions (?)
Did you ever think of writing some sort of "filter" option, so users can provide the type for which building up the graph should start? That way only returning one node would still be possible.
I will try to write a test that demonstrates that only the graph of the first node gets returned.
{
"@context": "http://schema.org",
"@graph": [
{
"@type": "Article",
"@id": "/articles/foobar",
"comment": [
{"@id": "/articles/foobar#comment-1"},
{"@id": "/articles/foobar#comment-2"}
]
},
{
"@type": "Comment",
"@id": "/articles/foobar#comment-1"
},
{
"@type": "Comment",
"@id": "/articles/foobar#comment-2"
}
]
}
@rvanlaak Re-opening ... looking forward to any constructive suggestion! :+1:
We for now added a custom JSON-LD parser that decorates the one of the library to support named graphs.
Our domain depends on filtering on @type
, so that's embedded in the parser because the constructor on ParserInterface
does not allow us to nicely inject it.
When $jsonLDRoot
does not match specification (read: has @graph
and @context
), the regular JsonLD
behavior gets used.
class JsonLDFilteredParser extends JsonLD
{
public const FORMAT = 32;
protected function parseRootNode($jsonLDRoot)
{
// Test Named Graphs specification
if (!isset($jsonLDRoot->{'@graph'}, $jsonLDRoot->{'@context'})) {
return parent::parseRootNode($jsonLDRoot);
}
try {
$jsonDLDocument = JsonLDParser::getDocument($jsonLDRoot, ['documentLoader' => $this->contextLoader]);
/** @var GraphInterface $graph */
$graph = $jsonDLDocument->getGraph();
// Run through all nodes to parse the first one
foreach (FilterTypes::types as $type) {
$nodes = $graph->getNodesByType('http://schema.org/'.$type);
if (1 === \count($nodes)) {
$node = current($nodes);
return $this->parseNode($node);
}
}
} catch (JsonLdException $exception) {
$this->logger->error($exception->getMessage(), ['exception' => $exception]);
}
return null;
}
}
Same problem, here's an example: https://www.macobserver.com/news/apple-changes-testing-ios-14/
@rvanlaak Where is the FilterTypes
class from in your example? I'm inferring that JsonLDParser
is ML\JsonLD\JsonLD
.
FilterTypes::types
is one of our local constants, it is just an array we prioritized based on which node type we want to find first.