acl-anthology icon indicating copy to clipboard operation
acl-anthology copied to clipboard

author urls in style name/id

Open danielgildea opened this issue 4 years ago • 19 comments

Issue #623

Now generates author pages with urls in form name/id

for most people this looks like: people/d/david-chiang/david-chiang/

Matt Post has an ORCID in name_variants.yaml, so his page is: people/m/matt-post/0000-0002-1297-6794/

and then there is: people/y/yang-liu/yang-liu-edinburgh/ people/y/yang-liu/yang-liu-ict/ people/y/yang-liu/yang-liu-icsi/ people/y/yang-liu/yang-liu-umich/

I don't know how to make the old URLs people/m/matt-post/ resolve.

danielgildea avatar Jan 03 '21 16:01 danielgildea

Thanks, I'll take a look soon!

mjpost avatar Jan 11 '21 14:01 mjpost

Some thoughts:

  • I think we should move to the base author page being at /people/matt-post instead of /people/m/matt-post. There's no reason for the intervening letter any more (echoing an earlier conversation). We can have hugo dump all files in one directory and maintain the old longer form with 301 redirects in the .htaccess file.
  • I don't love the look of /people/david-chiang/david-chiang. I think the top-level should be the page for (a) pointing to all the disambiguated names and (b) unresolved names. I see your comment about not being sure how to do this, one of us will have to figure it out.
  • We should come up with a prioritization scheme for IDs. For example, my ORCID is entered, but what if before that someone had created matt-post-rochester? It would be nice for that to redirect (as a 301) to /people/matt-post/{ORCID}, for backwards compatibility.
  • Separately, the name_variants.yaml files is getting a bit unwieldy IMO (for example, I dislike editing it, and creating new entries and putting them in the correct sorted place manually). I wonder if we should move to a directory data/yaml/people/ and then have a separate file for every canonical name.

mjpost avatar Jan 12 '21 03:01 mjpost

I definitely agree about dropping the first letter.

I agree that name_variants.yaml should be split up into lots of files; it's really an author database now and not just name variants.

davidweichiang avatar Jan 12 '21 03:01 davidweichiang

Sorry that I'm behind on this—I will catch up next week!

mjpost avatar Jan 27 '21 03:01 mjpost

I thought this would be a good testbed for the previews.

akoehn avatar Apr 06 '21 18:04 akoehn

Oh, yes, good call!

mjpost avatar Apr 06 '21 18:04 mjpost

Build successful. You can preview it here: https://aclanthology.org/previews/author-url

github-actions[bot] avatar Apr 06 '21 18:04 github-actions[bot]

Some TODOs:

  • [ ] Deconstitute the name_variants.yaml file
  • [ ] Build author pages directly under people/
  • [ ] Add 301 links to .htaccess file redirecting /people/m/matt-post/people/matt-post/

mjpost avatar Apr 06 '21 18:04 mjpost

Some thoughts after perusing the build preview:

  • I think the base name page should always be for disambiguation (pointing to the people that share that surface form) and for unclaimed / uncategorized names
  • We will have a "identification" process, whereby people can identify themselves. This is the same thing as disambiguating themselves, except that we will try to do identification for all names, not just ambiguous ones
  • A person's real page will therefore be under /people/matt-post/{IDENTIFIER}
  • We should support multiple IDs for people: ORCID, a custom Anthology ID for backward-compat, maybe a start ID.
  • We'll have a canonical identifier, that the other identifiers will redirect to. I suggest this be the ORCID, and that we coordinate with upstream conference management systems to have this added to the ingestion data.
  • For example, /people/matt-post/{ORCID}, /people/matt-post/startid:post could all point to the same place

mjpost avatar Apr 07 '21 15:04 mjpost

I suggest this be the ORCID, and that we coordinate with upstream conference management systems to have this added to the ingestion data.

Strongly agree and since we already know that we want to implement it, we (that is probably you) should already start requesting the inclusion of ORCID into the datasets.

akoehn avatar Apr 07 '21 15:04 akoehn

This simplest thing technically would be to add an ORCID field to their Softconf profiles, and to force them to do this prior to submission (and possibly final copies, which would let us get data from NAACL and ACL). Probably we can't force all authors to do this, but we could force the submitting author to do it.

Do you know if there are any downsides to ORCID? For example, maybe it's not available in China?

This will likely require coordination between us, Softconf, and the ACL Exec. I'm not sure whether everyone can move fast enough, but I'll get on it.

mjpost avatar Apr 07 '21 15:04 mjpost

Do you know if there are any downsides to ORCID? For example, maybe it's not available in China?

I don’t know any downsides; it is the standard and widely used in China as well: https://info.orcid.org/orcid-in-china/ So I don’t think it is blocked there :-)

I don’t even know any competing identifier, so using ORCID should be a no-brainer. It is used by every publisher I know.

This will likely require coordination between us, Softconf, and the ACL Exec. I'm not sure whether everyone can move fast enough, but I'll get on it.

:thumbsupemoji:

akoehn avatar Apr 07 '21 15:04 akoehn

Hi all:

There used to be competing in-house and standards (e.g., Thompson Reuter's own identifiers), but the publishers agreed unanimously that a central authority and identifier would be the best way to solve this problem (fortunately or not, it was before blockchain). We should adopt and promote its use over other domain-specific identifiers.

  • M

On Wed, Apr 7, 2021 at 11:56 PM Arne Köhn @.***> wrote:

Do you know if there are any downsides to ORCID? For example, maybe it's not available in China?

I don’t know any downsides; it is the standard and widely used in China as well: https://info.orcid.org/orcid-in-china/ So I don’t think it is blocked there :-)

I don’t even know any competing identifier, so using ORCID should be a no-brainer. It is used by every publisher I know.

This will likely require coordination between us, Softconf, and the ACL Exec. I'm not sure whether everyone can move fast enough, but I'll get on it.

:thumbsupemoji:

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub https://github.com/acl-org/acl-anthology/pull/1179#issuecomment-815027878, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABU726GEWRPBDKPJ2OVDKTTHR6CJANCNFSM4VR75SEA .

--

  • M

knmnyn avatar Apr 07 '21 17:04 knmnyn

+1000 to using ORCiD :)

And just in case: names can change, also (especially) at ORCiD, and it's probably worth thinking about how to handle that. I'll start a separate discussion with ACL exec for this in general in which you're very welcome to participate.

bastings avatar Apr 10 '21 13:04 bastings

Okay, Softconf has added this to the Global profile. You can set yours by visiting the global profile page. Maybe a few of you can test this, as I did?

Softconf is going to have this dumped in the DB file distributed with proceedings tarballs, so we will have it available for disambiguation purposes.

Next steps:

  1. Advertise this more widely to get people to voluntarily add it
  2. Work with conference organizers (probably for 2022+) to make this mandatory

mjpost avatar May 10 '21 16:05 mjpost

Build successful. You can preview it here: https://preview.aclanthology.org/author-url This preview will be removed when the branch is merged.

github-actions[bot] avatar Dec 21 '21 03:12 github-actions[bot]

Build successful. Some useful links:

  • Complete site preview: https://preview.aclanthology.org/author-url
  • Potential volumes of interest:

This preview will be removed when the branch is merged.

github-actions[bot] avatar Jan 11 '22 18:01 github-actions[bot]

Build successful. Some useful links:

  • Complete site preview: https://preview.aclanthology.org/author-url
  • Potential volumes of interest:

This preview will be removed when the branch is merged.

github-actions[bot] avatar Apr 29 '23 17:04 github-actions[bot]

Whoops, shift-enter sends a comment ...

Ideally, the location of a profile would not:

  • be too verbose
  • change when another author is added
  • depend on whether other authors are present
  • depend too much on the name of the author

The ideal for me, therefore, would be /ORCID/author-name, where author-name is optional (and /ORCID/ can forward to the page with the author name). The "only" problem is that we do not have ORCIDs everywhere in our dataset. That way we would e.g. also handle name changes more gracefully, which now negatively impacts a subset of our user base much more than other subsets.

Once we have the identifier first, this would not be as bad anymore:

A slug like /people/huy-nguyen/huy-nguyen-stanford seems quite verbose.

because it would be /people/huy-nguyen-stanford/huy-nguyen/ and the short version (/people/huy-nguyen-stanford/) would also work. It would also mean that the first person with their name could keep it (but I e.g. would have /arne-kohn/arne-kohn/ as my canonical URL...)

Ideally, we would have an ORCID for every author. This will not happen (as we cannot get them for all old entries) but it would be good to push for it going forward.

akoehn avatar May 02 '23 05:05 akoehn