apps-android-commons icon indicating copy to clipboard operation
apps-android-commons copied to clipboard

[Bug]: Two checksums for the same image - upload via upload wizard and commons app

Open OpenGreenStreet opened this issue 1 year ago • 3 comments

Summary

The same image can be uploaded twice to Commons, once using the Upload Wizard and once using the commons app (duplicate). Presumably, different checksums are created depending on the upload tool.

Steps to reproduce

  1. Upload via Upload Wizard
  2. Upload the same image (other file name) with commons app

Expected behaviour

Upload Wizard or Commons App inform me that there is a duplicate

Actual behaviour

See:

Category:Two checksums for the same image - upload via upload wizard and commons app.screenshots

https://commons.wikimedia.org/wiki/Category:Two_checksums_for_the_same_image_-_upload_via_upload_wizard_and_commons_app.screenshots

Device name

Google Pixel 7 Pro

Android version

Android 14

Commons app version

5.0.2~05ffd123e

Device logs

No response

Screen-shots

See:

Category:Two checksums for the same image - upload via upload wizard and commons app.screenshots

https://commons.wikimedia.org/wiki/Category:Two_checksums_for_the_same_image_-_upload_via_upload_wizard_and_commons_app.screenshots

Would you like to work on the issue?

None

OpenGreenStreet avatar Sep 01 '24 07:09 OpenGreenStreet

See also: https://github.com/commons-app/apps-android-commons/issues/5798

OpenGreenStreet avatar Sep 01 '24 07:09 OpenGreenStreet

See also: https://commons.wikimedia.org/wiki/Commons:Village_pump/Proposals/Archive/2022/07#Duplikat-Erkennung:_Commons_Hochlade-Assistent_und_Commons-App

OpenGreenStreet avatar Sep 01 '24 07:09 OpenGreenStreet

Anyone willing to try and fix this bug?

Ideally, the Commons app should check whether any of these two checksums get a match using the Wikimedia server's search API:

  • The checksum of the image that the user is about to upload.
  • The checksum of the image that the user is about to upload after applying the user's configured EXIF anonymizations.

nicolas-raoul avatar Sep 01 '24 08:09 nicolas-raoul

I want to try :hand:

parneet-guraya avatar Dec 07 '24 23:12 parneet-guraya

Thanks Parneet! First, would you mind posting here a link to the code that checks whether an image is already on Commons or not? :-)

nicolas-raoul avatar Dec 07 '24 23:12 nicolas-raoul

https://github.com/commons-app/apps-android-commons/blob/64fd10d00e8ba5bd220db32740247c6b00e3cd8e/app/src/main/java/fr/free/nrw/commons/media/MediaClient.kt#L51

This above is used through some logic in UploadPresenter

This PR #5570 did remove a piece that I suspect might be the issue, can't say for sure..

Screenshot from 2024-12-08 06-11-22

parneet-guraya avatar Dec 08 '24 00:12 parneet-guraya

Great find!

Now you will have to call that function a second time, with the SHA of the original file (before EXIF modification).

nicolas-raoul avatar Dec 08 '24 01:12 nicolas-raoul

Just to confirm, we just want the checksum that app generates to be same with web's upload wizard one, so it can properly prompt the user that this is a duplicate?. And this has nothing do with blocking duplicate uploads? (BTW is it allowed to upload duplicates?)

parneet-guraya avatar Dec 09 '24 00:12 parneet-guraya

The website does not modify files, so the checksum of a file uploaded via the website is the same as the original file's checksum.

The API does not prevent uploading duplicates, but our app should do whatever it can to prevent this from happening.

nicolas-raoul avatar Dec 09 '24 01:12 nicolas-raoul

Thanks! I found the issue, basically when checking for duplication before upload the checksum that is generated is different from the web one generates. Also, when the actual upload happens checksum is again generated but this time it matches with the web one. Difference was we pass two different android.net.Uri instances in these two cases, I'm gonna find the why and make the changes.

Now you will have to call that function a second time, with the SHA of the original file (before EXIF modification).

And where is the first time we should be calling this function?

parneet-guraya avatar Dec 10 '24 03:12 parneet-guraya