jabref icon indicating copy to clipboard operation
jabref copied to clipboard

If user moved file, it should simply be relinked - [Find Unlinked Files part]

Open koppor opened this issue 2 years ago • 46 comments

A user might move files in the file directory without JabRef.

Then, JabRef does not find the file:

Image

But it still exists.

Example:

  1. Open `test-support\src\manual-tests\issue-9798\issue-9798.bib in JabRef
  2. Start "Lookup" -> "Find unlinked files"
    Image
  3. Now, it finds "minimal.pdf"
    Image

Current behavior: Blank right of minimal.pdf - and it would be re-imported

Expected result:

File should be relinked - and following shown in the status bar

file relinked to entry "minimal"


Alternative explanation:

Start:

@misc{test,
  file = {a.pdf}
}

File a.pdf exists

User moves a.pdf into folder-a.

When "Find unlinked files" is started, JabRef should

  • Recognize that a.pdf of test is missing
  • folder-a/a.pdf does not belong to any entry
  • Rewrite the link of test from a.pdf to folder-a/a.pdf

If a.pdf exists multiple times, no rewrite should be done.

First implementation

The functionality should be in "Automatically set file links" (Quality -> Automatically set file links). In case it is absolutely sure that the file was moved (i.e., no two files with the same name), they should be moved.

image

Current output:

image

Follow-up implementation (NOT IN SCOPE OF THIS ISSUE)

(The following text is only a reminder of the development team to open a new issue)

The "Search for unlinked files" (Lookup -> Find unlinked files) dialog should get two columns as search result. Currently 1.

  • column 1: file x@folder y (as it is currently; with checkmarks)
  • column 2: candidate entry (citation key. If clicked, main table jumps to that entry)

In case no candidate entry exists, "" is put in. In case two (or more) candidate entries exist, the cell turns into a dropdown. The jump link is next to the drop down.

image


Update 2024-01-09: Testing effort high is serious. Crafting test cases for different scenarios is necessary. See especially the comment at https://github.com/JabRef/jabref/issues/9798#issuecomment-1761330493.

Update 2024-03-05: Testing effort high is really serious. A single test case is not enough. The PR https://github.com/JabRef/jabref/pull/10526 showed an initial test case. -- One could think of using https://github.com/google/jimfs for more advanced tests. However @TempDir with some files should be enough.

koppor avatar Apr 25 '23 06:04 koppor

Hello my name is Alexandra Stathopoulou and i am an academic student in a greek University. For my semester course I have been asked to contribute in an open source software project and i would like to take on this issue. Could I be assigned on this?

Alexandra-Stath avatar Apr 25 '23 07:04 Alexandra-Stath

As a general advice for newcomers: check out Contributing for a start. Also, guidelines for setting up a local workspace is worth having a look at.

Feel free to ask here at GitHub, if you have any issue related questions. If you have questions about how to setup your workspace use JabRef's Gitter chat. Try to open a (draft) pull-request early on, so that people can see you are working on the issue and so that they can see the direction the pull request is heading towards. This way, you will likely receive valuable feedback.

github-actions[bot] avatar Apr 25 '23 08:04 github-actions[bot]

Would be great to see this! Only question is, should this be always done automatically? Especially for some cases this might create wrong links (before two files "a.pdf" exist at different places; user deletes one; now the other "a.pdf" gets linked, although it's possibly different).

claell avatar Apr 25 '23 14:04 claell

@claell The "First Implementation" is fully automatic. The second one has a user check build in.

May I ask if your described case is an edge case? OK, there could be "download.pdf" being linked. Howver, JabRef supports one with renaming files to a pattern. The default is even CitationKey - Title. I doubt that it often happens that two files following that pattern are semantically different. -- Maybe, you bring up another description, saying that having duplicate files in differnt folders happens often?

koppor avatar Apr 26 '23 23:04 koppor

In addition to the proposed implementations, I suggest a button here in case the file wasn't found. Right now the UX isn't great, the user only sees the file wasn't found and there is no action to try to solve that.

grafik

claell avatar Apr 27 '23 09:04 claell

Regarding the described case: Yes, this is an edge case, but robust implementations should cover such edge cases to not create wrong data. Personally, I don't use that feature with renaming files to a pattern (yet). If I would, I might not even have this issue, as the current file searching implementations that JabRef has all rely on such renamed files. So this issue here is especially relevant for people who don't use the pattern named files. The function described in this issue would have been very useful to me after importing a .bib file from Citavi which had the file names, but not the correct directory included. At this stage, there are no pattern named files, as they haven't been opened in JabRef before.

claell avatar Apr 27 '23 09:04 claell

Hello there,

I have been looking through the issue for the first implementation and this is the solution I have come up with:

First, when the Search for unlinked local files is initiated, the program should first review all the entries of the current library where the file input is filled. For those entries, the program should check if the file name and the full path are valid (exists and matches to some file in the local folders). If either of these do not exist then the entry name (entry ID) should be kept in an array, as well as the file’s name. Then, the Search for unlinked local files begins in the local machine as it is done currently. The filename saved as mentioned above shall be searched for locally. If found the path of this unlinked file should be added to the previous array as a new addition.

If (a) the file is not found, or (b) the path is not recognized and a file with the same name is found in a different folder the user should be informed with the display of the right message. Also, the user should be advised to (a) delete the file input or (b) change the file path.

If this is the case, a fitting message shall appear on the screen informing the user that: a) The file path or name of an entry is not found or matched to a local file.

b) The file path of an entry is not matched and a file with the specific name is found in the local folder.

For example, as mentioned in the issue the message shall be:

a) Entry: test, file: a.pdf File name or path not found or matched to a local file. You are advised to delete the file input as it is not valid.

b) Entry: test, file: a.pdf Invalid file path. File named: a.pdf found on local folder: folder-a. You can automatically change file path by selecting entry and using the “Automatically set file links” key (F7).

Lastly, when the Automatically set files link key is pressed the program shall change the file path to the new full path of the file with the same name found unless more than one file has the same name.

Please inform me about your thoughts on this approach.

Kind regards, Alexandra Stathopoulou

Alexandra-Stath avatar Apr 28 '23 21:04 Alexandra-Stath

The function described in this issue would have been very useful to me after importing a .bib file from Citavi which had the file names

Can you describe how citavi names the files?

We surely could make a notification listing all entries having non-existant files, but other entries with the same filename. (In the case for auto linking). Then, the user can manually investigate.

koppor avatar Apr 28 '23 22:04 koppor

In addition to the proposed implementations, I suggest a button here in case the file wasn't found. Right now the UX isn't great, the user only sees the file wasn't found and there is no action to try to solve that.

grafik

Yeah! Currently, JabRef tries magically to resolve the file link, which causes confusion sometimes: https://github.com/JabRef/jabref/issues/9800

koppor avatar Apr 28 '23 22:04 koppor

Your description starts well.

Then, I have some comments:

[...] ]f either of these do not exist then the entry name (entry ID) should be kept in an array, as well as the file’s name.

You do not need an ordering (which an array provides). It should be another data structure. Maybe Map<String,BibEntry>, where String is the filename. -

Then, the Search for unlinked local files begins in the local machine as it is done currently. The filename saved as mentioned above shall be searched for locally. If found the path of this unlinked file should be added to the previous array as a new addition.

The array contains files not existing any more, now you add existing files?

If (a) the file is not found, or (b) the path is not recognized and a file with the same name is found in a different folder the user should be informed with the display of the right message. Also, the user should be advised to (a) delete the file input or (b) change the file path.

Nothing with prompts please. JabRef should do "magic" here (because it is "automatically set file links). Prompts should be done in the second option ("Follow-up implementation"). The second one is "Files -> Find unlinked files" with a UI.

a) Entry: test, file: a.pdf File name or path not found or matched to a local file. You are advised to delete the file input as it is not valid.

A user put the file into the library folder, because he wants to collect it as reference. Deletion would remove the file from the references. It should be added. --> The user can use the "Find unlinked files" functionality for that.

b) Entry: test, file: a.pdf Invalid file path. File named: a.pdf found on local folder: folder-a. You can automatically change file path by selecting entry and using the “Automatically set file links” key (F7).

The automatically set file links should work perfectly when selecting all entries of the library.

Lastly, when the Automatically set files link key is pressed the program shall change the file path to the new full path of the file with the same name found unless more than one file has the same name.

This is the intention. Building a data structure based on the existing entries and the file system is necessary for it.

koppor avatar Apr 28 '23 22:04 koppor

The function described in this issue would have been very useful to me after importing a .bib file from Citavi which had the file names

Can you describe how citavi names the files?

We surely could make a notification listing all entries having non-existant files, but other entries with the same filename. (In the case for auto linking). Then, the user can manually investigate.

That would be great (also for other use cases). I don't exactly remember right now; will investigate.

claell avatar May 30 '23 07:05 claell

Here is an example from a .bib export from Citavi (the colons seem to be Citavi specific?):

@misc{Glinz.2020,
 author = {Glinz, Martin and {van Loenhoud}, Hans and Staal, Stefan and Bühne, Stan},
 date = {2020},
 title = {Handbuch für das CPRE Foundation Level nach dem IREB-Standard: Aus- und Weiterbildung zum Certified Professional for Requirements Engineering (CPRE) Foundation Level},
 url = {https://www.ireb.org/content/downloads/5-cpre-foundation-level-handbook/cpre_foundationlevel_handbook_de_v1.0.pdf},
 institution = {{International Requirements Engineering Board (IREB) e.V.}},
 file = {CPRE Foundation Level - Handbuch (2020):Attachments/CPRE Foundation Level - Handbuch (2020).pdf:application/pdf}
}

The name is from the file that I added as attachment directly in Citavi. However, I also have that file stored separately on my drive with that name. If deemed sensible, I can open a new issue for Citavi, as it might require treating this specific export format.

claell avatar May 30 '23 08:05 claell

In addition to the proposed implementations, I suggest a button here in case the file wasn't found. Right now the UX isn't great, the user only sees the file wasn't found and there is no action to try to solve that. grafik

Yeah! Currently, JabRef tries magically to resolve the file link, which causes confusion sometimes: #9800

I like that behavior, just that it doesn't give options there when it cannot find the file. But that might need general consideration of UX, considering that the whole behavior might get changed a bit with the idea in this issue.

claell avatar May 30 '23 08:05 claell

@Alexandra-Stath Will you continue working on this or will you focus on other issues?

koppor avatar Jun 07 '23 17:06 koppor

@koppor No, since I worked on URLCleanup issue.

Alexandra-Stath avatar Jun 15 '23 09:06 Alexandra-Stath

Hi @koppor, I worked on Linked identifier for ISBNs issue recently. I want to take this up. Shall I ?

mkumar09 avatar Jun 16 '23 11:06 mkumar09

In comparison to https://github.com/JabRef/jabref/pull/9925, this is a hard one. One really needs to craft tests! Especially, file-base tests. @TempDir is your friend.

I recommend doing test-driven development here. Otherwise, the issue will be very hard to fix ...

Go on if you feel comfortable in applying TDD!

koppor avatar Jun 16 '23 16:06 koppor

Okay, I'll pass. I'll take up some other issue

mkumar09 avatar Jun 18 '23 10:06 mkumar09

Also a uni student - would love to take a crack at this!

u74981018 avatar Oct 13 '23 10:10 u74981018

@u74981018 Thank you for your interest. I assigned you. Please take my comment of "test-driven development" serious.

Proposal:

  1. Create test setting
  2. Implement functionality
  3. Craft JUnit tests

Maybe, 2 and 3 can be done in a loop

Test Settings:

  • Create a test directory for each directory
  • Create a sub directory for the setting before the move - let's name that dir A
  • Create another (!) sub directory for the setting after the move - let's name that dir B
  • The bib file should reside in dir A. Set . in the library properties as pdf directory
  • Run "Automatically set file links" - check that the bib file was not modified
  • Copy the bib file from A to B
  • Open the bib file in JabRef
  • Run "Automatically set file links" - check that the bib file was modified as desired (in case everything was implemented. At the beginning, JabRef should have done nothing)

Hint: You can use git to version the test setting locally. Use git init and then git gui together with the staging area to track changes. No need to commit at each try. You can use the revert functionality of git gui.

koppor avatar Oct 13 '23 11:10 koppor

@koppor Hi, I've started some working on this issue and I've started to familiarize myself with the repo!

  • I was looking to start commiting , but I don't understand how I'm meant to create a branch (as indicated in https://devdocs.jabref.org/contributing)
  • I've made a new branch in my forked repo but this repo has similar issue branches and I was wondering how I make my own. (I tried through github but the button isn't there)

Thanks Heaps!

u74981018 avatar Oct 19 '23 12:10 u74981018

@u74981018 Please take this also as challenge to learn using ChatGPT.

koppor avatar Oct 20 '23 07:10 koppor

Code: The method called form the UI is org.jabref.gui.externalfiles.AutoSetFileLinksUtil#linkAssociatedFiles.

koppor avatar Oct 23 '23 22:10 koppor

Hi, my university software engineering group are willing to tackle this issue. Would it be possible to assign us ? Here are our git username @maja-larsson, @NicoleWij, @RiLoGosh, @martinctl and @serhan-cakmak Thank you for your answer!

Maja-Larsson avatar Feb 23 '24 10:02 Maja-Larsson

As a general advice for newcomers: check out Contributing for a start. Also, guidelines for setting up a local workspace is worth having a look at.

Feel free to ask here at GitHub, if you have any issue related questions. If you have questions about how to setup your workspace use JabRef's Gitter chat. Try to open a (draft) pull-request early on, so that people can see you are working on the issue and so that they can see the direction the pull request is heading towards. This way, you will likely receive valuable feedback.

github-actions[bot] avatar Feb 23 '24 12:02 github-actions[bot]

@Maja-Larsson Sure thing! I assigned you only, because GitHub allows only to assign commenters of an issue.

Note that this issue is more about requirements engineering and developing test cases than the actual implementation. Thus, use this as chance to learn about JUnit and test-driven development. In practice, this means: Write tests, use the "Debug" button to execute the tests and let the IDE suspend on breakpoints. See how the code behaves. Adapt it. Re-run. Until the test is green. Then, the next test.

koppor avatar Feb 23 '24 12:02 koppor

Hi! Our group (assigned to Maja-Larsson ) made a version of the First implementation based on the work done previously by u74981018 in issue #10526 . The relevant work you can see done in the pull request above. Would love to hear feedback and receive further insight! Thank you!

RiLoGosh avatar Mar 01 '24 15:03 RiLoGosh

Our course have unfortunately ended and we will not be able to continue working on this issue. I will unassign me from this and I hope we have contributed to the issue in some way! Thanks for all the feedback!

Maja-Larsson avatar Mar 05 '24 14:03 Maja-Larsson