h icon indicating copy to clipboard operation
h copied to clipboard

Wrong URL and Title are stored on some web pages

Open rasulkireev opened this issue 4 years ago • 1 comments

Steps to reproduce

  1. Go to one of Doist articles, for example the one on networking
  2. Make a couple of highlights / annotations
  3. Check your hypothesis dashboard

Expected behaviour

I expect to see the article named "How to Make Virtual Networking Less Cringey (With Real-Life Examples to Help)" with all the highlights from that page.

Actual behaviour

Instead I see an entry for a different blog post ("7 Ways to Support Your Team During the Pandemic — And Any Crisis" in my case) and a different URL. The highlights and annotations make sense, but the title and URL don't. This is interesting since when I go the correct page I do see all the annotation in the sidebar.

Browser/system information

I am using:

  • Vivaldi | 3.7.2218.58 (Stable channel) (x86_64)
  • OS | macOS Version 10.15.7 (Build 19H524)

Additional details

I checked the Source Code for the Networking page and so no reference to the other blog post that actually was registered.

rasulkireev avatar Apr 27 '21 23:04 rasulkireev

There are a large number of different URLs which have all been grouped into the same document ID (1132150) in the h document database.

It looks like this may be caused by some of the <link> tags on the page. For example there are various rel=alternate links which have the same value across all pages. We really need to revisit how the whole equivalence mechanism works, and be more restrictive in which metadata on a page can create equivalences between URLs (if at all).

robertknight avatar Jul 04 '22 14:07 robertknight