wd icon indicating copy to clipboard operation
wd copied to clipboard

The extension does not seem to be able to extract social media ids within <a> links or entity candidates from <title>

Open teolemon opened this issue 2 years ago • 9 comments

What

  • The extension does not seem to be able to extract social media id within links on this page https://www.saint-ouen.fr/
  • Despite a good tag, it's not able to propose candidates entities

Screenshot

image

HTML samples

<title>Accueil - Mairie de Saint-Ouen-sur-Seine</title>

  <li>
    <a href="https://www.facebook.com/villesaintouen" class="link-rs" target="_blank" rel="nofollow">
      <img src="/fileadmin/user_upload/fichiers/ic%C3%B4nes/facebook.png" width="20" height="20" alt="">
      <span class="out">Facebook</span>
    </a>
  </li>
  <li>
    <a href="https://twitter.com/villesaintouen" class="link-rs" target="_blank" rel="nofollow">
      <img src="/fileadmin/user_upload/fichiers/ic%C3%B4nes/twitter.png" width="20" height="20" alt="">
      <span class="out">Twitter</span>
    </a>
  </li>
  <li>
    <a href="https://www.instagram.com/villesaintouen" class="link-rs" target="_blank" rel="nofollow">
      <img src="/fileadmin/user_upload/fichiers/ic%C3%B4nes/instagram-icon.png" width="20" height="20" alt="">
      <span class="out">Instagram</span>
    </a>
  </li>
  <li>
    <a href="https://www.youtube.com/mairiesaintouen93" class="link-rs" target="_blank" rel="nofollow">
      <img src="/fileadmin/user_upload/fichiers/ic%C3%B4nes/youtube-icon.png" width="20" height="20" alt="">
      <span class="out">Youtube</span>
    </a>
  </li>
</ul>

teolemon avatar Jun 18 '22 09:06 teolemon

Hi,

You expect https://www.saint-ouen.fr to be matched to Q208889 because it contains a link to https://twitter.com/villesaintouen which is connected to Q208889? is that it?

fuddl avatar Jun 18 '22 13:06 fuddl

not even that (although it could be another interesting issue) I just expected it to propose Twitter, Instagram and YouTube as suggested properties Is that because it didn't detect a candidate match ? Is some regex done on the social urls within HTML ?

Back to your point, it could indeed be an idea, more pressant would be the fact that image is already on the item, so it could be leverage if the query is not too expensive

teolemon avatar Jun 19 '22 08:06 teolemon

I just expected it to propose Twitter, Instagram and YouTube as suggested properties

I actually planned this, but resolving any link on a website turned out to be slow

Back to your point, it could indeed be an idea, more pressant would be the fact that image is already on the item, so it could be leverage if the query is not too expensive

That is indeed, very annoying. Thanks for writing it down. I'll see what I can do

fuddl avatar Jun 19 '22 12:06 fuddl

You can do this now

  1. click Add a new statement
  2. wait for the links to resolve
  3. select the social media link that is missing
  4. click Send to wikidata

Result: The statement will be added with a very accuarate source statement. 🎉

fuddl avatar Oct 06 '22 17:10 fuddl

And how is it supposed to work atm? Because one of the problems I often ran into was: i go to the 'contact' page, and that specific page isn't linked on the wikidata item. But I don't want to link the contact page specifically to it, I just want to use it to extract statements.

thibaultmol avatar Oct 07 '22 06:10 thibaultmol

And how is it supposed to work atm? Because one of the problems I often ran into was: i go to the 'contact' page, and that specific page isn't linked on the wikidata item. But I don't want to link the contact page specifically to it, I just want to use it to extract statements.

I'm afraid I cannot offer a perfect solution since I cannot confidentally reduce that every page under the same domain represents the same item but here is a workaround for that specific problem:

Lets say this is the frontpage:
Screenshot 2022-10-07 at 09 00 20 And this is your contact page:
Screenshot 2022-10-07 at 09 00 51 You can append #wd:[wikidata id] to the URL in this case the url https://www.saint-ouen.fr/404.html#wd:Q208889: Screenshot 2022-10-07 at 09 01 36

This suffix causes the extension to always resolve to the specified item: now you can go ahead as described above.

fuddl avatar Oct 07 '22 07:10 fuddl

I see. (would it be possible to have this be a button in the sidebar instead?) just a checkbox you can check like "Look at main domain"

thibaultmol avatar Oct 07 '22 07:10 thibaultmol

(also: Facebook ID's don't seem to get extracted atm. )

thibaultmol avatar Oct 07 '22 08:10 thibaultmol

(also: Facebook ID's don't seem to get extracted atm. )

Please show me an example

fuddl avatar Oct 07 '22 08:10 fuddl