micrometa icon indicating copy to clipboard operation
micrometa copied to clipboard

JSON-LD parser does only find the first item

Open jkphl opened this issue 7 years ago • 5 comments

Am 20.03.2017 um 13:59 schrieb Claas Kalwa:

Hallo Joschi,

ich habe Probleme beim Extrahieren mehrerer JSON-LD Items mit dem Micrometa V1 Parser. Er erkennt lediglich das erste Item, egal ob die Items mit @graph gruppiert sind oder seperat in eigenen script-Elementen vorkommen.

Im Anhang habe ich ein Beispiel, das eigentlich funktionieren sollte, denke ich.

Hast Du eine Idee, wo das Problem liegen könnte?

Example source:

<!DOCTYPE html>

<html>
    <head>
        <title>TODO supply a title</title>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1.0">

	<script type="application/ld+json">
	{
	 "@context": "http://schema.org",
	 "@graph": [
	{
	  "name": "Google Inc.",
	  "@type": "LocalBusiness",
	  "address": {
	    "@type": "PostalAddress",
	    "addressCountry": "United States",
	    "streetAddress": "1600 Amphitheatre Parkway",
	    "addressLocality": "Mountain View",
	    "addressRegion": "CA",
	    "postOfficeBoxNumber": null,
	    "postalCode": "94043",
	    "telephone": "+1 650-253-0000",
	    "faxNumber": "+1 650-253-0001"
	  }
	},
	{
	  "name": "Google Ann Arbor",
	  "@type": "LocalBusiness",
	  "address": {
	    "@type": "PostalAddress",
	    "addressCountry": "United States",
	    "streetAddress": "201 S. Division St. Suite 500",
	    "addressLocality": "Ann Arbor",
	    "addressRegion": "MI",
	    "postOfficeBoxNumber": null,
	    "postalCode": "48104",
	    "telephone": "+1 734-332-6500",
	    "faxNumber": "+1 734-332-6501"
	  }
	}
	 ]
	}
	</script>

    </head>
    <body>
        <div>TODO write content</div>
        
    </body>
</html>

jkphl avatar Mar 24 '17 16:03 jkphl

The commit closing this issue does not entirely fix this issue. The JSON LD implementation still does not find multiple items in case the value of @graph has more than one root item (read: is an array).

Why? Because \Jkphl\Micrometa\Infrastructure\Parser\JsonLD::parseRootNode does only return the first found node. This probably is the specific framing implementation the class docbloc mentions (?)

Did you ever think of writing some sort of "filter" option, so users can provide the type for which building up the graph should start? That way only returning one node would still be possible.

I will try to write a test that demonstrates that only the graph of the first node gets returned.

{
  "@context": "http://schema.org",
  "@graph": [
    {
      "@type": "Article",
      "@id": "/articles/foobar",
      "comment": [
        {"@id": "/articles/foobar#comment-1"},
        {"@id": "/articles/foobar#comment-2"}
      ]
    },
    {
      "@type": "Comment",
      "@id": "/articles/foobar#comment-1"
    },
    {
      "@type": "Comment",
      "@id": "/articles/foobar#comment-2"
    }
  ]
}

rvanlaak avatar Oct 31 '19 11:10 rvanlaak

@rvanlaak Re-opening ... looking forward to any constructive suggestion! :+1:

jkphl avatar Nov 01 '19 18:11 jkphl

We for now added a custom JSON-LD parser that decorates the one of the library to support named graphs.

Our domain depends on filtering on @type, so that's embedded in the parser because the constructor on ParserInterface does not allow us to nicely inject it.

When $jsonLDRoot does not match specification (read: has @graph and @context), the regular JsonLD behavior gets used.

class JsonLDFilteredParser extends JsonLD
{
    public const FORMAT = 32;

    protected function parseRootNode($jsonLDRoot)
    {
        // Test Named Graphs specification
        if (!isset($jsonLDRoot->{'@graph'}, $jsonLDRoot->{'@context'})) {
            return parent::parseRootNode($jsonLDRoot);
        }

        try {
            $jsonDLDocument = JsonLDParser::getDocument($jsonLDRoot, ['documentLoader' => $this->contextLoader]);

            /** @var GraphInterface $graph */
            $graph = $jsonDLDocument->getGraph();

            // Run through all nodes to parse the first one
            foreach (FilterTypes::types as $type) {
                $nodes = $graph->getNodesByType('http://schema.org/'.$type);

                if (1 === \count($nodes)) {
                    $node = current($nodes);

                    return $this->parseNode($node);
                }
            }
        } catch (JsonLdException $exception) {
            $this->logger->error($exception->getMessage(), ['exception' => $exception]);
        }

        return null;
    }
}

rvanlaak avatar Nov 06 '19 09:11 rvanlaak

Same problem, here's an example: https://www.macobserver.com/news/apple-changes-testing-ios-14/

@rvanlaak Where is the FilterTypes class from in your example? I'm inferring that JsonLDParser is ML\JsonLD\JsonLD.

Sarke avatar Nov 22 '19 00:11 Sarke

FilterTypes::types is one of our local constants, it is just an array we prioritized based on which node type we want to find first.

rvanlaak avatar Nov 22 '19 08:11 rvanlaak