deno-dom icon indicating copy to clipboard operation
deno-dom copied to clipboard

attribute .href doesn't work

Open ralyodio opened this issue 3 years ago • 3 comments

  console.log(body);
  const doc = new DOMParser().parseFromString(body, "text/html");
  const links = [...doc.querySelectorAll('a.result-title')];

  for (const link of links) {
    const title = link.innerText;
    const url = link.href;

    console.log({ title, url });
  }

does this library not support standard dom methods? link.href should give me the href attribute.

ralyodio avatar Nov 23 '21 04:11 ralyodio

So far Deno DOM only implements the Element class. .href is a property of HTMLAnchorElement, a more specific DOM element implementation, of which there are many, and I haven't got around to implementing yet. So for now you can use the getAttribute("href") method of Element.

b-fuze avatar Nov 23 '21 04:11 b-fuze

ok no worries.

ralyodio avatar Nov 23 '21 05:11 ralyodio

One complication here is that — in a browser — the document has a location property that is used to resolve fully-qualified URLs when accessing properties like HTMLAnchorElement.href.

When a document is parsed from an HTML string using a DOMParser instance, there's not a way to attach the location information to the resulting document with the current API.

This makes it non-trivial to get fully-qualified URLs from properties on elements within the trees of such parsed documents.

However, this is both desirable and a common task, so I want to share two workaround approaches for resolving URLs from anchor element href attributes:

Functional approach

This is safer and has better type compatibility. Here's a commented example:

href-example.ts:

import {
  DOMParser,
  type Element,
} from "https://deno.land/x/[email protected]/deno-dom-wasm.ts";
import { assert } from "https://deno.land/[email protected]/testing/asserts.ts";

/** Functional form of `element.href` */
function resolveHref(element: Element, url: string | URL): string | undefined {
  const href = element.getAttribute("href");
  //    ^? const href: string | null
  if (!href) return undefined;
  return new URL(href, url).href;
}

function main() {
  const url = new URL("https://example.com/page/hello");

  // Imagine the following HTML came from a fetch request to the URL above:
  // const html = await (await fetch(url)).text();
  const html = `
  <!doctype html>
  <html lang="en">
  <head>
    <meta charset="utf-8" />
    <title>hello world</title>
  </head>
  <body>
    <h1>hello world</h1>
    <a class="external" href="https://en.wikipedia.org/wiki/Hello">wikipedia</a>
    <a class="relative" href="about">about</a>
    <a class="root" href="/account">account</a>
  </body>
  </html>
  `;

  const document = new DOMParser().parseFromString(html, "text/html");
  //    ^? const document: HTMLDocument | null
  assert(document, "The document could not be parsed");

  for (const className of ["external", "relative", "root"]) {
    const anchor = document.querySelector(`a.${className}`);
    //    ^? const anchor: Element | null
    assert(anchor, "Anchor element not found");

    const hrefRaw = anchor.getAttribute("href");
    //    ^? const hrefRaw: string | null

    const href = resolveHref(anchor, url);
    //    ^? const href: string | undefined

    console.log({ hrefRaw, href });
  }
}

if (import.meta.main) main();

% deno run href-example.ts
{
  hrefRaw: "https://en.wikipedia.org/wiki/Hello",
  href: "https://en.wikipedia.org/wiki/Hello"
}
{ hrefRaw: "about", href: "https://example.com/page/about" }
{ hrefRaw: "/account", href: "https://example.com/account" }

Prototype manipulation

The previous approach could become tedious if there are lots of hrefs that need to be accessed. This approach defines the href property on the prototype of a created anchor element, setting its getter and setter at the time the document is parsed.

It allows for obtaining a URL string by directly accessing the href property on an element (like in browser code), but requires using a type assertion when doing so.

href-hack.ts:

import {
  DOMParser,
  type Element,
  type HTMLDocument,
} from "https://deno.land/x/[email protected]/deno-dom-wasm.ts";
import { assert } from "https://deno.land/[email protected]/testing/asserts.ts";

type HrefAttr = { href: string };
type ElementWithHref = Element & Partial<HrefAttr>;
type HTMLDocumentWithHref = HTMLDocument & { location: HrefAttr };

function createDocumentWithHref(
  html: string,
  url: string | URL,
): HTMLDocumentWithHref {
  const document = new DOMParser().parseFromString(html, "text/html");
  assert(document, "The document could not be parsed");

  (document as HTMLDocumentWithHref).location = new URL(url);

  const elementProto = Object.getPrototypeOf(document.createElement("a"));
  Object.defineProperty(elementProto, "href", {
    configurable: true,
    enumerable: false,
    get() {
      const baseUrl = this.ownerDocument?.location?.href as string | undefined;
      const href = this.getAttribute("href") as string | null;
      if (!(baseUrl && href)) return undefined;
      return new URL(href, baseUrl).href;
    },
    set(url: string) {
      this.setAttribute("href", url);
    },
  });

  return document as HTMLDocumentWithHref;
}

function main() {
  const url = new URL("https://example.com/page/hello");

  // Imagine the following HTML came from a fetch request to the URL above:
  // const html = await (await fetch(url)).text();
  const html = `
  <!doctype html>
  <html lang="en">
  <head>
    <meta charset="utf-8" />
    <title>hello world</title>
  </head>
  <body>
    <h1>hello world</h1>
    <a class="external" href="https://en.wikipedia.org/wiki/Hello">wikipedia</a>
    <a class="relative" href="about">about</a>
    <a class="root" href="/account">account</a>
  </body>
  </html>
  `;

  const document = createDocumentWithHref(html, url);

  for (const className of ["external", "relative", "root"]) {
    const anchor = document.querySelector(`a.${className}`);
    //    ^? const anchor: Element | null
    assert(anchor, "Anchor element not found");

    const hrefRaw = anchor.getAttribute("href");
    //    ^? const hrefRaw: string | null

    // The `.href` property doesn't exist on type Element,
    // so trying to access it will create a compiler diagnostic error:
    //
    // anchor.href;
    //        ~~~~
    // Property 'href' does not exist on type 'Element'.deno-ts(2339)

    // Instead, you must assert that the Element is type ElementWithHref:
    const href = (anchor as ElementWithHref).href;
    //                                       ^ (property) href?: string | undefined

    console.log({ hrefRaw, href });
  }
}

if (import.meta.main) main();

% deno run href-hack.ts
{
  hrefRaw: "https://en.wikipedia.org/wiki/Hello",
  href: "https://en.wikipedia.org/wiki/Hello"
}
{ hrefRaw: "about", href: "https://example.com/page/about" }
{ hrefRaw: "/account", href: "https://example.com/account" }

Both approaches result in the same outputs.

jsejcksn avatar Sep 19 '22 06:09 jsejcksn