php-mf2 icon indicating copy to clipboard operation
php-mf2 copied to clipboard

properties with invalid names are not properly ignored

Open sknebel opened this issue 2 years ago • 5 comments

in v0.5.0, the following example (distilled down from https://catgirlin.space/ as of right now)

<div class=" h-feed pt-2">
  <div class="p-4 h-entry">
    <a class="u-url" href="https://example.com/bar">bar </a>
  </div>
  <div class="p-4 h-entry">
    <a class="u-url" href="https://example.com/foo">foo</a>	
  </div>
</div>

parses as

{
    "items": [
        {
            "type": [ "h-feed"],
            "properties": [
                [
                    {
                        "type": ["h-entry" ],
                        "properties": {
                            "url": [ "https://example.com/bar"],
                            "name": [ "bar" ]
                        },
                        "value": "bar"
                    },
                    {
                        "type": [ "h-entry"],
                        "properties": {
                            "url": ["https://example.com/foo" ],
                            "name": ["foo"]
                        },
                        "value": "foo"
                    }
                ]
            ]
        }
    ],
…

Note how properties of the h-feed has been turned into two nested lists somehow, which is completely invalid output.

Potentially related, when nesting the two h-entries, somehow a property named 0 gets generated, which is also badly wrong (note that no 0 appears in the input)

<div class=" h-feed pt-2">
  <div class="p-4 h-entry">
    <a class="u-url" href="https://example.com/bar">bar </a>
    <div class="p-4 h-entry">
      <a class="u-url" href="https://example.com/foo">foo</a>	
    </div>
  <div>
</div>

parsed as

{
    "items": [
        {
            "type": [ "h-feed"],
            "properties": [
                [
                    {
                        "type": ["h-entry"],
                        "properties": {
                            "url": ["https://example.com/bar"],
                            "0": [
                                {
                                    "type": ["h-entry"],
                                    "properties": {
                                        "url": ["https://example.com/foo" ],
                                        "name": [ "foo" ]
                                    },
                                    "value": "foo"
                                }
                            ]
                        },
                        "value": "bar foo"
                    }
                ]
            ]
        }
    ],

Maybe related to #246

sknebel avatar Oct 09 '23 19:10 sknebel

I don't think we have any examples of microformats vocabularies where the value is only a number, and I thought that actually wasn't allowed in the mf2 spec, but now I can't confirm that. In any case, maybe we can update the spec to ignore p-* where * is only [0-9]? I guess this is an issue for the parsing spec not this library, but it might be a simpler way to "fix" the library.

aaronpk avatar Oct 09 '23 20:10 aaronpk

I believe it's already ignoring class names with numbers. I'll have to take a deeper look soon, but my guess is it's parsing the nested h-entry and trying to add it as a property like: ['property-name' => 'nested-microformat'], but since the property name is ignored it results in a weird structure like [null => 'nested-microformat'], which I think PHP would coerce into [0 => 'nested-microformat'] -- instead of adding it to the parsed children

gRegorLove avatar Oct 09 '23 20:10 gRegorLove

@aaronpk

I guess this is an issue for the parsing spec not this library,

The parsing spec already clearly makes such names invalid, hence it is a parser issue to not silently ignore such classes

sknebel avatar Oct 09 '23 21:10 sknebel

Oh I misread this: https://microformats.org/wiki/microformats2-parsing#parse_an_element_for_class_microformats

The "*" for root (and property) class names consists of an optional vendor prefix (series of 1+ number or lowercase a-z characters i.e. [0-9a-z]+, followed by '-'), then one or more '-' separated lowercase a-z words.

So a vendor prefix can be a number, so p-1-a would be valid, but p-1 is not.

aaronpk avatar Oct 09 '23 21:10 aaronpk

I think I have a working fix for this. Running some more tests and will submit a PR.

gRegorLove avatar Oct 11 '23 22:10 gRegorLove