floki icon indicating copy to clipboard operation
floki copied to clipboard

Floki.find doesn't support non-alpha characters

Open rahultumpala opened this issue 8 months ago • 0 comments

Description

I'm using Floki to read a html document and extract some elements from it. The element has id that contains a forward slash. I used Floki.find with the selector #element/abc but this returns an empty list though an element with the same id is present in the document.

I used Floki.get_by_id with the id element/abc and this fetched the correct element.

To Reproduce

The following elixir script reproduces the issue.


Mix.install([
:floki
])

raw_html = """
<html lang="en">
  <head>
      <meta charset="UTF-8">
      <meta name="viewport" content="width=device-width, initial-scale=1.0">
      <title>Document</title>
  </head>
  <body>
          <p id="text"> text </p>
          <p id="hello/there"> hello/there </p>
          <p id="hello.there"> hello.there </p>
      </div>
  </body>
</html>

"""

{:ok, document} = Floki.parse_document(raw_html)


Floki.find(document, "#text") |> IO.inspect() #works
Floki.find(document, "#hello/there") |> IO.inspect() #does not work and is not documented
Floki.get_by_id(document, "hello/there") |> IO.inspect() #works
Floki.find(document, "#hello.there") |> IO.inspect() #does not work and is documented
Floki.find(document, "#hello\\.there") |> IO.inspect() # works and is documented
Floki.get_by_id(document, "hello.there") |> IO.inspect() # works

extra info: a debug log stating the forward slash token is not recognized.


19:59:17.209 [debug] Unknown token ~c"/". Ignoring.

Expected behavior

I would expect Floki.find and Floki.get_by_id to work the same way or add a note in the overview page of Floki doc that it isn't supported especially since Floki.get_by_id is not listed anywhere in the overview page.

or

We could add support to escape non-alpha characters in the selector passed to Floki.find. I am willing to contribute if you could guide me.

rahultumpala avatar Apr 25 '25 14:04 rahultumpala