jabref Support for multi-paper PDFs (AKA proceedings)

One type of publication are conference proceedings. There, multiple papers are collected in proceedings. There are also books with multiple chapters. Example: "Cyber-Physical Systems of Systems", https://link.springer.com/book/10.1007/978-3-319-47590-5

As researcher, I am interested in a) PDFs for my BibTeX entries and b) having the first page of the paper opened when I am opening the PDF of the paper. Moreover, I c) have existing PDF files on my hard disk, I would like to import (refs https://github.com/JabRef/jabref/pull/7929).

Additionally,, when I import a PDF file to an entry, JabRef's fulltext fetcher sometimes fetches the complete PDF and not the PDF of the entry itself. This is "OK" for me, because I sometimes have multiple papers of one proceeding, thus keeping one proceedings PDF is interesting for me, too.

[ ] When importing a PDF file (File -> import, "Find unlinked files", following should be done
- Determine the List<BibEntry> contained in the PDF
- Create one proceedings/collection/book entry for the whole PDF (collectionEntry)
  - type according to the determined book type
- For each BibEntry: Create BibEntry in library
  - crossref the collectionEntry (pay attention of the differences of BibTeX and BibLaTeX mode)

Regarding the PDF handling of multi-entry PDF files:

[ ] JabRef should offer to jump to the first page specified in the pages field when opening the attached PDF
[ ] When no attached PDF is present, but a cross-referenced entry and that entry has a PDF attached, JabRef should offer the functionality to a) open the PDF of cross-referenced entry and b) jump to a specific page. Thereby, the target page in the pages field should be respected.

Optionally: When attaching a PDF to an existing entry), there should be following done:

Split PDF: In case a PDF is a multi-paper PDF, JabRef should split the PDF
- Keep the original PDF file
- Determine the pages of the paper inside the PDF
- Copy these pages into a new PDF file
- Attach this PDF file to the current entry
[ ] The split functionality could also be done "on demand". A user selects the PDF attached to an entry and selects "split PDF". Then, JabRef splits the PDF and creates BibEntries for each contained paper.

Oct 07 '21 04:10 koppor

This also affects fulltext search. When linking a proceedings PDF, the whole PDF is indexed. When searching, that PDF will end up in the search results even though the hit might be in a paper that was not added to the database. Ideally, this would be detected and only the pages of papers in the database are indexed and linked to the correct bibentry.

Oct 13 '21 10:10 btut

Hi, we are a group of students studying Master of Computer Science in University of Adelaide and we wanted to check if this issue is available for us to work on for our assignment. And if it is available, we would like to get some valuable inputs based on the previous contributions to this issue.

Mar 26 '22 05:03 zhaoqingying123

@zhaoqingying123 The issue is still available. Please first start with test cases. For that, please fetch example PDFs (or create example PDFs to avoid licensing issues).

The previous approach was made by University of Basel.

Here is the documentation: https://github.com/thepauljs/jabref/tree/main/docs/sweng

Here is the code: https://github.com/thepauljs/jabref/blob/main/src/main/java/org/jabref/logic/importer/fileformat/MultiPaperHandler.java - with test cases https://github.com/thepauljs/jabref/blob/main/src/test/java/org/jabref/logic/importer/fileformat/MultiPaperHandlerTest.java

The documentation and test cases are a good start. It stills needs much work to get it finished. So, a good chance for you to improve JabRef!

Apr 03 '22 21:04 koppor

Welcome and thank you! Adding to koppor, also check out the guidelines for contributing to Jabref. They can be found here: https://github.com/JabRef/jabref/blob/main/CONTRIBUTING.md. See here for a rough outline of this process. In general, it is advised to open a (draft) pull request early on so that reviewers have time to comment and the general direction of the request becomes clear. This will allow you to receive valuable feedback!

If you have any questions, feel free to ask! Either here at GitHub, or you also can join our gitter chat.

Apr 03 '22 22:04 ThiloteE

There is also the other way round: mulitple PDFs for a single entry. See https://github.com/beckus/ieeetranplus for an example.

Jul 16 '24 09:07 koppor

Users can also make Proceedings entry for the conference and InProceedings for the theses.

In theses, users can fill crossref for conference.

Different PDF may be stored for conference (full proceedings, for example, any informational documents), and the specific theses in InProceedings entry

Dec 13 '24 10:12 InAnYan

Different PDF may be stored for conference (full proceedings, for example, any informational documents), and the specific theses in InProceedings entry

The point of the issue is to have a single PDF containing mutiple papers. Example: https://link.springer.com/book/10.1007/978-3-031-70396-6 or https://dl.gi.de/items/4d345ea0-858f-4c7c-94ee-0fe95154e7f9.

Dec 13 '24 11:12 koppor

Current diea:

have proceedings entry
proceedigns entry has the PDF attached
proceedings entry has a page offset stored (default: 0)
each paper entry has a crossref to the proceedings entry
each paper entry has "pages" set
when file open is requiested, JabRef looks at the current entry. if no file is attached, look cross refed entry - and open the PDF on the page given at the entry (respecting the offset)

Jul 20 '25 21:07 koppor

jabref jabref copied to clipboard

Support for multi-paper PDFs (AKA proceedings)

jabref
jabref copied to clipboard