acl-anthology icon indicating copy to clipboard operation
acl-anthology copied to clipboard

Adding CatalyzeX code finder integration to increase code coverage

Open himanshuragtah1 opened this issue 1 year ago • 7 comments

Added this issue here as instructed by @akoehn

-- I propose an integration with CatalyzeX that finds and links to code implementations for papers. This would be a great enhancement to ACL Anthology's current coverage of code.

We can open a pull request to your repo and can send you that shortly for review.

Here's what it would look like:

image

In case other sources have code, it can be shown in the dropdown as well.

himanshuragtah1 avatar Jan 07 '24 22:01 himanshuragtah1

Thanks for the suggestion! I'm still on vacation, so will check this out later. Can you add an example URL that this would link to? And maybe a brief explanation for someone unfamiliar with CatalyzeX what this provides that isn't covered by our existing Papers with Code integration yet?

mbollmann avatar Jan 08 '24 00:01 mbollmann

Hope you had a wonderful vacation, and a great start to the new year! :)

Here is an example CatalyzeX url image

that corresponds to this ACL paper: image

Regarding Papers with Code: In this context, although the functionality is similar — providing open-source implementations available for a paper —CatalyzeX has a larger, fast-growing collection of code implementations (approaching a million) that can be helpful to augment/complement what's currently surfaced for papers on ACL Anthology.

We similarly do so with live integrations on Arxiv and Openreview too.

We're continually crawling Github, Bitbucket, Gitlab, Sourceforge, and various personal/academic/professional webpages, and constantly getting code submissions via our website and popular browser extensions.

Hope this helps clarify, and please let us know if you have any questions. Looking forward to discussing next steps.

himanshuragtah1 avatar Jan 19 '24 01:01 himanshuragtah1

@mbollmann @akoehn — just following up here. Any next steps here or anything we can help with to move this forward? :)

himanshuragtah1 avatar Jan 24 '24 03:01 himanshuragtah1

@mjpost Do you have an opinion on this feature? I didn’t get around to taking a closer look at this yet, but @himanshuragtah1 says (via e-mail) that they can have a PR ready very quickly if we wanted to integrate this.

mbollmann avatar Feb 17 '24 11:02 mbollmann

There is one question I have: our pwc integration only has a link for code in case we actually do have code. I think that this is a good practice and we also do use this information in publication lists: grafik see the [|||] symbol.

I am not sure how we should handle two data sources here.

Regarding the type of the integration: would you plan to use the same kind of integration (i.e. sending pull requests to add the links) or do you want to add a general javascript widget on the pages?

[chatgpt please insert sorry for late reply boilerplate]

akoehn avatar Mar 12 '24 09:03 akoehn

Hi @himanshuragtah1—thanks for submitting the request, and I'm sorry that I've only now been able to look at this.

First, a few questions:

  • This url (https://www.catalyzex.com/paper/arxiv:2010.15411/code) looks like. I assume the format would be something like https://www.catalyzex.com/paper/acl:N19-1423/code?
  • Can you confirm that the data we are linking to would be accessible without an account?
  • I assume you would also provide a link back to the Anthology page, as you do for arXiv?

I'm open to this, but it would largely depend on how easy you could make the integration, since we are volunteer run. This includes:

  • Augmenting our XML and updating the schema
  • An automated Github workflow for integration
  • Proposing a way to link compactly to both your site and PWC (e.g., your proposed drop-down list)

mjpost avatar Mar 12 '24 10:03 mjpost

Hi @mjpost — Sorry for the late reply. Thanks for taking the time to have a look at this code integration proposal :)

  • The url you mentioned can also be accessed here: https://www.catalyzex.com/paper/conversation-graph-data-augmentation-training, so ACL papers will be in the slug url format
  • Yes it's accessible without an account. After a certain number of tries, we do require a user to sign in to confirm they're human to safeguard against bots/scripts, etc.
  • Yes we would provide a link back to the Anthology page as we do for arXiv and Openreview.

As suggested, we would actually prefer to have a JS widget that is capable of performing real-time requests to our own server for checking code availability, and then modifying the DOM accordingly from there. With this, we see a couple of advantages:

  • We don't need to modify your XML and internal build workflows, simplifying a lot the review process and the integration in general;
  • The website will always display the most up-to-date information about code from CatalyzeX.

And of course, we will compactly handle both CX and PWC buttons, by introducing a dropdown like the one we shared in the screenshot above.


@akoehn — Regarding handling two data sources in the publication list: To keep it simple, we’ll just add another icon there. In cases where they don’t have code it will be just one code icon. The end user will benefit from having access to some code to work with and build upon. image

Regarding the type of the integration: If possible, we would like to make as few changes as possible in your XML files and codebase in general. In our integration to other providers, like arxiv, we have our own javascript widget that fetches code information on the client side. This helps us always show up-to-date results, apart from simplifying the integration.

Let me know if all this sounds good, and we can open a PR shortly for your review. :)

himanshuragtah1 avatar May 07 '24 23:05 himanshuragtah1