fontoxpath icon indicating copy to clipboard operation
fontoxpath copied to clipboard

enable `declarationMap` in `tsconfig.json` and include `src` in the published package

Open DetachHead opened this issue 3 years ago • 2 comments

i'm trying to locate elements in a browser dom (for my playwright selector engine) and think i've identified an issue that's specific to firefox.

however debugging is difficult even when i enable source maps in my project because this package doesn't have the source code included in the npm package, so i can only step into minified js.

i was going to try and make a PR to do this, but then i realised there's a custom build script so i'm not really sure how to go about it

DetachHead avatar Aug 22 '22 09:08 DetachHead

Hey @DetachHead,

When we (speaking of myself and my colleagues working on FontoXPath) run into any issue that might be related to the engine itself, we usually do not attempt to debug the binaries themselves. They are just too complex to debug: the google closure compiler rewrites code significantly, sourcemaps have not been as useful as we want.

Looking at your selector engine, I think I see a part of the issue: https://github.com/DetachHead/playwright-xpath/blob/master/src/browser/main.ts#L5. Namespaces are notorious in XPath, especially when working with HTML (which is not XML). Your 'trick' to translate any prefix (including the empty prefix) to the xhtml namespace works and is also what we do when we use FontoXPath for (x)html. You can choose to make it a bit more correct by returning the xhtml uri only for the empty prefix, returning undefined for others. This will make it better to work with svg elements for example, where the URI should be http://www.w3.org/2000/svg.

Does this help you on your way?

Kind regards,

Martin Middel

DrRataplan avatar Sep 02 '22 10:09 DrRataplan

@DrRataplan thanks for the reply

When we (speaking of myself and my colleagues working on FontoXPath) run into any issue that might be related to the engine itself, we usually do not attempt to debug the binaries themselves. They are just too complex to debug: the google closure compiler rewrites code significantly, sourcemaps have not been as useful as we want.

i ended up compiling it myself with esbuild and the source maps it created helped me figure out what was going on.

Looking at your selector engine, I think I see a part of the issue: https://github.com/DetachHead/playwright-xpath/blob/master/src/browser/main.ts#L5. Namespaces are notorious in XPath, especially when working with HTML (which is not XML). Your 'trick' to translate any prefix (including the empty prefix) to the xhtml namespace works and is also what we do when we use FontoXPath for (x)html.

so the issue i found was that document.lookupNamespaceURI(null) returns null in firefox, and "http://www.w3.org/1999/xhtml" in chrome. i added that line to workaround it which seemed to solve the problem, though admittedly i have no idea how it works or why that fixed it lol.

You can choose to make it a bit more correct by returning the xhtml uri only for the empty prefix, returning undefined for others. This will make it better to work with svg elements for example, where the URI should be http://www.w3.org/2000/svg.

i gave this a go but it doesn't seem to work on svg's. any idea what i'm doing wrong? https://github.com/DetachHead/playwright-xpath/pull/11

DetachHead avatar Sep 03 '22 05:09 DetachHead

Oh that, yuck. Did not know that... Thanks for pointing this one out, I'll document it as a caveat. This is why I recommend explicitly passing a namespace resolver. Namespaces are not always implemented 100% the same. Because nametests in XPath will simply not match, they are terrible to debug.

A short explanation on namespaces that I like is here: https://developer.mozilla.org/en-US/docs/Web/SVG/Namespaces_Crash_Course.

Because elements are identified with a localname plus a namespace uri, XPath needs to be told how the prefixes (including the empty prefix) should be translated to their corresponding URI. For example, a query like //div/svg will not match SVGs in div elements. It'll try to find svg elements in the html namespace instead. The best way to match all svg elements in div elements it to write a query like //div/svg:svg with the namespaceResolver returning the html uri for no prefix, and the svg uri for svg elements. You might need to process xlink mathml(?) in the same way.

DrRataplan avatar Sep 05 '22 08:09 DrRataplan

@DrRataplan what about if namespaceResolver took a second argument for the node name? that way, i could make it return a different namespace when the node is an svg.

i'd be happy to make a PR if you're ok with this idea

DetachHead avatar Sep 06 '22 10:09 DetachHead

That was also an earlier idea of mine, but then child::svg:* would be broken. Same with XPaths that (attempt to) refer to the different namespaces using the same prefix. I foresee madness.

Namespaces are something that's forced unto you when using XML technology. In some cases they actually make sense and solve problems, in most cases they are just a hurdle to jump over. The built-in XPath 1.0 engine in the browser also runs into this: https://developer.mozilla.org/en-US/docs/Web/XPath/Introduction_to_using_XPath_in_JavaScript#implementing_a_default_namespace_resolver.

My approach would be the following:

  • Document that the svg prefix should be used when addressing SVG elements (like descendant::svg:stroke)
  • Document that for addressing html element, no prefix is needed (like descendant::p/span)
  • Document that you can combine the two: descendant::section[@class="figure"]//svg:path

Another approach I see in use in Selenium for example is one I do not like as much: they instruct to use name() and local-name(): //*[name() = "svg"]. This is too verbose in my opinion.

So in the end your namespaceResolver could look like

(prefix) => {
  switch (prefix) {
    case 'svg':
      return 'http://www.w3.org/2000/svg';
    case '':
      return 'http://www.w3.org/1999/xhtml'
    default:
      console.warn(`Unused prefix used in XPath: ${prefix}. Only the 'svg' and the default namespace are registered.`)
      return undefined;
    }
  }
}

Does this help you on your way?

DrRataplan avatar Sep 08 '22 12:09 DrRataplan

it does. i've released 1.2.0 with the changes you suggested.

thanks for all your help :)

DetachHead avatar Sep 11 '22 09:09 DetachHead

Awesome, I;ll go ahead and close this issue! please do reach out if I can be of service!

DrRataplan avatar Sep 14 '22 12:09 DrRataplan