jabref icon indicating copy to clipboard operation
jabref copied to clipboard

Improve Equality check for Journal abbreviations

Open Siedlerchr opened this issue 3 years ago • 1 comments

I have a question regarding the use of ampersands &, I often come across \& in my .bib files when an ampersand is used, JabRef doesn't seem to recognize that journaltitle if the name is not exactly the same. Do we need to add entries for all occurences of & and \& and and as well?

Originally posted by @jhossbach in https://github.com/JabRef/abbrv.jabref.org/issues/100#issuecomment-1129177022

Siedlerchr avatar Jul 03 '22 21:07 Siedlerchr

  • [ ] Abbreviations should always be without \& --> There is a CI check "necessary" at the abbrv repo
  • [ ] JabRef should be able to handle both & and \& when abbrevating and unabbreviating
  • [ ] When putting a result, \& should be used

Background:

  • LaTeX requires & to be escaped.

koppor avatar Jul 04 '22 18:07 koppor

Hi,

I am with a group of University Students looking to close this issue for a University assignment. Could I confirm that no one is actively working on this and possible be assigned. Thanks!

AkshatJain9 avatar Oct 06 '22 00:10 AkshatJain9

Welcome and thanks for your interest :-)

As a general advice for newcomers: Check out https://github.com/JabRef/jabref/blob/main/CONTRIBUTING.md for a start. Also, https://devdocs.jabref.org/getting-into-the-code/guidelines-for-setting-up-a-local-workspace is worth having a look at. Feel free to ask if you have any questions here on GitHub or also at JabRef's Gitter chat.

Try to open a (draft) pull request early on, so that people can see you are working on the issue and so that they can see the direction the pull request is heading towards. This way, you will likely receive valuable feedback.

ThiloteE avatar Oct 06 '22 00:10 ThiloteE

Abbreviations should always be without & --> There is a CI check "necessary" at the abbrv repo JabRef should be able to handle both & and & when abbrevating and unabbreviating

Would you be able to clarify by what you mean by these first two points? I am currently working on the issue and have so far gathered that whenever we write an &, it should appear as an \& in the bib file (unless we are in a url{...} command). Just a little confused on what you are proposing with the first two points. Thanks!

AkshatJain9 avatar Oct 11 '22 08:10 AkshatJain9

Looking into this I have implemented a basic translation from & -> & which looks like the following. Notice that the & in the Journal field is parsed as a & in the BibTeX as required; image image

My only question at this stage is; we have the reading parsing working as intended (that is, in line with BibTeX standards), so should I add parsing for other escaped characters, e.g. \ for \ etc.

Also, I will link my commit, but right now this logic is placed in BibWriter.java directly, should this be moved somewhere else?

AkshatJain9 avatar Oct 14 '22 04:10 AkshatJain9

@AkshatJain9 The idea is only one part of the solution. But this does not belong to the BIbWriter. This belongs somewhere in the Journal Abbreviations formatter itself. We already have EscapeAmpersandsFormatter you can use.

Siedlerchr avatar Oct 14 '22 06:10 Siedlerchr

I should have clarified, my teammate has already worked on the mechanisms for reading from the abbreviations repo and treating & and & as equal, my job is to just make sure & is written correctly in the BibTeX.

I saw EscapeAmpersandsFormatter but struggled to understand where it was being used currently, seems like its functionality is nested in a lot of other classes which made it a little difficult to reason about. Do you have any guidance?

AkshatJain9 avatar Oct 14 '22 08:10 AkshatJain9

Please do NOT change the BibTeX reading and writing! JabRef tries to keep the .bib file as is.

We have the formatters in place, which can be configured by the user at library properties - and also at Quality -> cleanup entries. See https://docs.jabref.org/finding-sorting-and-cleaning-entries/cleanupentries for details.

The ideas were following:

  1. Journal abbreviations have the ampersand stored unescaped
  2. In the BibEntry, the ampersand can be stored escaped or unescaped
  3. When the user wants to write the field "correctly", they should configure saving actions

For idea 1:

  • The journal lists are stored at https://github.com/JabRef/abbrv.jabref.org/tree/main/journals
  • Each CSV needs to be checked
  • The check should be done automatically.
  • Proposal for the check: A GitHub action executing a command

The issue for that is https://github.com/JabRef/abbrv.jabref.org/issues/107

For idea 2:

  • There is a latex free field - org.jabref.model.entry.BibEntry#getResolvedFieldOrAliasLatexFree
  • This could have the ampersand stored unescaped

For idea 3:

  • The configuration is at
    grafik
  • Maybe, at new libraries (.bib files), this should be the default (to be discussed with @JabRef/developers)
  • The action is https://docs.jabref.org/finding-sorting-and-cleaning-entries/saveactions#escape-ampersands

koppor avatar Oct 14 '22 21:10 koppor

@AkshatJain9 Maybe a good start would be to work on https://github.com/koppor/jabref/issues/585. This is a very focussed issue.

koppor avatar Oct 14 '22 23:10 koppor

Thanks, I'll have a look!

AkshatJain9 avatar Oct 15 '22 06:10 AkshatJain9

Hi! I'm working on this project with @AkshatJain9 and was hoping to clarify my understanding of the issue. In the point about "JabRef should be able to handle both & and \& when abbrevating and unabbreviating" mentioned in https://github.com/JabRef/jabref/issues/8948#issuecomment-1174044999, is this referring to the ability for a journal title to be abbreviated even when the & is escaped?

For example, given that "ACS Applied Materials & Interfaces" can be abbreviated as "ACS Appl. Mater. Interfaces", is the idea that "ACS Applied Materials \& Interfaces" should also be able to be abbreviated as "ACS Appl. Mater. Interfaces"?

ANUu7312578 avatar Oct 15 '22 12:10 ANUu7312578

@AkshatJain9 Yes. As you can see JabRef stores the journal names and abbreviations together in a database (created fromt the csv files) and then does a lookup for the journal name and the abbreviation.

  1. Unescaped: ACS Applied Materials & Interfaces -> ACS Appl. Mater. Interfaces
  2. Escaped: ACS Applied Materials\&Interfaces -> ACS Appl. Mater. Interfaces

For the other direction, the unnabbrev: ACS Appl. Mater. Interfaces -> ACS Applied Materials & Interfaces

Siedlerchr avatar Oct 15 '22 15:10 Siedlerchr

Hi, was just wondering if this comment (https://github.com/JabRef/jabref/issues/8948#issuecomment-1279763204) was actually replying to my question? Additionally, would I be able to confirm that in the unnabbreviation direction, the & is always unescaped?

@AkshatJain9 Yes. As you can see JabRef stores the journal names and abbreviations together in a database (created fromt the csv files) and then does a lookup for the journal name and the abbreviation.

  1. Unescaped: ACS Applied Materials & Interfaces -> ACS Appl. Mater. Interfaces
  2. Escaped: ACS Applied Materials & Interfaces -> ACS Appl. Mater. Interfaces

For the other direction, the unnabbrev: ACS Appl. Mater. Interfaces -> ACS Applied Materials & Interfaces

ANUu7312578 avatar Oct 16 '22 00:10 ANUu7312578

@ANUu7312578 Yes, sorry, tagged the wrong handle ;)

Yes, I would say for the un-abbreviate we should always use the unescaped variant. e.g. ACS Appl. Mater. Interfaces -> ACS Applied Materials & Interfaces

Siedlerchr avatar Oct 17 '22 21:10 Siedlerchr

Hi! I opened a draft pull request which attempts to fix the 2nd idea out of the three ideas present in the original issue. Namely, the "JabRef should be able to handle both & and \& when abbrevating and unabbreviating". This is because the first idea ("Abbreviations should always be without \& --> There is a CI check "necessary" at the abbrv repo") is being implemented in the repo which stores all the abbreviations and the 3rd idea ("When putting a result, \& should be used") seems to be already implemented but just requires the user to enable the option.

Would it be possible for me to get some feedback on the pull request?

ANUu7312578 avatar Oct 23 '22 09:10 ANUu7312578

DevCall label was set because of discussing default save actions of a newly created library.

We decided against it in today's DevCall, because the database is changed "magically". We like the "integrity check" more: https://docs.jabref.org/finding-sorting-and-cleaning-entries/checkintegrity

The next step for the integrity check should be:

  • More checks
  • Remember violations which are OK
  • Show violation count after save ("This library has 5 quality issues, please open the integrity check for a complete list").

koppor avatar Oct 24 '22 19:10 koppor