apps-android-commons Suggest depictions using image recognition

It would be great if I was proposed the item Elephant when I take a picture of an elephant.

There are some APIs for this, not sure if any is usable for free. The API would provide a few words such as {elephant, zoo} and we would perform a Wikidata item search on these words and add the resulting items to the list of suggestions.

In using an online service, the feature should probably be opt-in since the privacy policy of the API will most probably be incompatible with the Wikimedia privacy policy.

Feb 25 '16 03:02 nicolas-raoul

A friend has left Google to create an AI company and is looking for people to test his library. He promises to open source it soon. Unlike Google libraries, it is usable offline.

This looks like a great opportunity to develop this feature, since no such library existed so far (as far as I know) Anyone interested in working on this right now? I can send you the library. Thanks a lot!

Mar 25 '17 15:03 nicolas-raoul

Sounds great, and yeah should definitely be opt-in. I could chuck this into my IEG renewal proposal, but that probably won't be for another couple of months, so anyone who wants to work on it sooner is most welcome.

Mar 25 '17 15:03 misaochan

There is a grant proposal to create an API for that: https://meta.wikimedia.org/wiki/Grants:Project/AICAT

Feb 06 '18 12:02 nicolas-raoul

@nicolas-raoul Sounds very useful! How did you hear of it? I wanted to post an endorsement, but their community notifications section is still empty so I was hesitant. :)

Feb 06 '18 13:02 misaochan

@misaochan I learned about it here: https://www.wikidata.org/wiki/Wikidata:Project_chat#Interesting_initiative I added an endorsement.

Feb 06 '18 15:02 nicolas-raoul

I did the same. :) Even if the grant is approved though, it will probably be about a year before the API is usable (the grant is 8 months, and I believe the next Project Grant round starts in July).

Feb 06 '18 15:02 misaochan

Thanks for the endorsement @nicolas-raoul! I am one of the guys behind the proposal. We welcome any suggestions and advice!

Feb 19 '18 14:02 alexeymorgunov

Recent WMF blog post https://blog.wikimedia.org.uk/2018/02/structured-data-on-commons-is-the-most-important-development-in-wikimedias-usability/ :

show[...] the user images with suggested ‘fields’, allowing the user to then swipe left or right to say whether or not the image should be tagged with the suggested category. This would allow the community to help organise the uncategorised images on Commons much more efficiently.

This sounds very similar to the present issue. Categories will become Structured Commons properties in the future, but that does not make that much difference from the point of view of this issue.

The idea of swiping left/right is interesting, let's gather the pros/cons: Pros of swiping:

Cool gesture
The whole screen space is available to show infomation about the category (property), for instance textual explanation or example images that have it (which is the topic of #1244)

Cons of swiping:

No global view, for instance after taking a picture of a red car in Italy, you get the suggestion "Car" and you swipe Yes, then "red car" and you swipe Yes again, then "red car in Italy" and you swipe Yes again. If you had seen all of them from the beginning, you would have selected only the last (most precise) category. With Structured Commons this should not be a problem as color/country/etc are perpendicular properties.
Takes more time. The current suggestion screen shows like 50 suggestions. With swiping you can not reasonably expect the user to swipe more than 10 times for a single upload.

The other new idea we can steal from this blog is that category suggestion could be used not only for the picture I just uploaded, but also for uncategorized pictures uploaded by other people.

Mar 13 '18 10:03 nicolas-raoul

Hai , My name is Aaron.So i am interested in contributing to the Commons App for GSoC18 to allow users to browse.So i was wondering if i could use image processing,Like when the user uses the camera to take a photo,the app scans the area and gives possible suggestions which could include users see other people's work etc. we could use TensorFlow lite and an image processing model like Inception-v3 Inception-v3 has already been tested successfully in TensorFlow lite, they say and i quote "the model is guaranteed to work out of the box" Do you think this could work?Looking forward to suggestions.....

Mar 19 '18 07:03 aaronpp65

@aaronpp65 basic questions about this solution:

Does it work offline?
What is the size (kilobytes) of the part we must embed in our app's APK?
What is the license of the whole thing? (if it does not work offline, please cite the license of the server part too) Thanks :-)

Also, if I understand correctly that library gives you a word like "leopard" or "container ship", right? How do you propose matching these strings to:

Wikimedia Commons categories (see https://commons.wikimedia.org/wiki/Category:Topics)
Wikidata entities (see https://www.wikidata.org/wiki/Special:Random)

Mar 19 '18 08:03 nicolas-raoul

*It’s machine-learning on the go, without the need for connectivity. *TensorFlow Lite is < 300KB in size when all operators are linked and <= 200KB when using only the operators needed for standard supported models (MobileNet and InceptionV3). *TensorFlow is an open-source software,it was released under the Apache 2.0 open source license

Mar 19 '18 12:03 aaronpp65

Yes the library gives you a word like "leopard" or "container ship" but it happens when we use a pre-trained Incpetion v3. Its trained using Imagnet data set. So instead of using a pretrained model, we can train the Inception model using our own wikimedia commons dataset.Hence we will get strings similar to that of the commons.Then we can query this string in the commons database and retrieve other people work . But then as you asked before we will need connectivity to do this part of querying the database

Mar 20 '18 08:03 aaronpp65

@aaronpp65 Very impressive, thanks! Requiring connectivity during training is no problem, of course. But using Commons as a training set unfortunately sounds difficult, because:

Most Commons category have only 10 files or less, which is not enough for training.
Commons images are usually not fit for training, for instance these are images in https://commons.wikimedia.org/wiki/Category:Container_ships :

port_of_salem_container_ship tiff $2016-08-05_frachtschiffreise_stockwerkbetten_auf_containerfeeder_ms_dornbusch$ 180px-container_terminal_layout_nt 180px-2006container_fleet 180px-container-ship-rates svg 180px-mv-ascension-route stowage_numbering_system_by_lisa_staugaard pdf 180px-rock_near_sutro_baths_ 2897398080 135px-mooring_boat_with_container_ship

So I guess we'd be better off trying to match from ImageNet categories to Commons or Wikidata.
https://opendata.stackexchange.com/questions/12541/mapping-between-imagenet-and-wikimedia-commons-categories
https://opendata.stackexchange.com/questions/12542/mapping-between-imagenet-and-wikidata-entities

Mar 20 '18 10:03 nicolas-raoul

Yeah.....So mapping ImageNet with commons should do the trick

Mar 20 '18 17:03 aaronpp65

@nicolas-raoul will you please check my draft and give possible feedbacks. Thanks

Mar 20 '18 18:03 aaronpp65

@aaronpp65 Could you please post a link to your draft? Thanks!

Mar 22 '18 08:03 nicolas-raoul

https://docs.google.com/document/d/1am3EbhBrwaYn2_LLKAmnrXlzTGVWgttCdALAV4fy_NU/edit?usp=sharing @nicolas-raoul here is the link to the draft. I should make one in phabricator too right?

Mar 22 '18 16:03 aaronpp65

Yes, please post it on Phabricator, thanks :-)

On Fri, Mar 23, 2018 at 1:56 AM, aaronpp65 [email protected] wrote:

https://docs.google.com/document/d/1am3EbhBrwaYn2_ LLKAmnrXlzTGVWgttCdALAV4fy_NU/edit?usp=sharing @nicolas-raoul https://github.com/nicolas-raoul here is the link to the draft. I should make one in phabricator too right?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/commons-app/apps-android-commons/issues/75#issuecomment-375379344, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGFBg4ZoDeKOeG6RxeFEOexRjr-wttXks5tg9eigaJpZM4HiYl2 .

Mar 23 '18 04:03 nicolas-raoul

Could you please explain in more details the following steps: - Convert the model to the TensorFlow Lite file format. - Integrating the converted model into Android application

Also, please add a step-by-step description of what the user will see, what screen they will go to, what button they click, so that we understand what this project will bring to the app. Feel free to include hand-drawn screens to make it clearer if necessary.

Thanks! :-)

On Fri, Mar 23, 2018 at 1:34 PM, Nicolas Raoul [email protected] wrote:

Yes, please post it on Phabricator, thanks :-)

On Fri, Mar 23, 2018 at 1:56 AM, aaronpp65 [email protected] wrote:

https://docs.google.com/document/d/1am3EbhBrwaYn2_LLKAmnrXlz TGVWgttCdALAV4fy_NU/edit?usp=sharing @nicolas-raoul https://github.com/nicolas-raoul here is the link to the draft. I should make one in phabricator too right?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/commons-app/apps-android-commons/issues/75#issuecomment-375379344, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGFBg4ZoDeKOeG6RxeFEOexRjr-wttXks5tg9eigaJpZM4HiYl2 .

Mar 23 '18 04:03 nicolas-raoul

@nicolas-raoul i have made the required changes and have added a basic wireframe Thanks for the feedback.

Mar 23 '18 17:03 aaronpp65

@aaronpp65 Thanks! If I understand correctly, your idea would work like this:

I take a picture of a butterfly
I upload it to Commons via the app
I go to the app's gallery and touch my picture
In the details view that opens, pictures that are similar to my picture (other pictures of butterflies) are shown below my picture Is my understanding correct? Thanks!

Mar 27 '18 01:03 nicolas-raoul

@nicolas-raoul actually what i am suggesting is much simpler than that.....

*User clicks the camera icon to snap *Camera opens up *Before taking the pic,the camera scans the premise and provides suggestion at the bottom of the screen *The user scroll through these suggestions(while still in camera view ) and takes appropriate pic.

so number of clicks a user has to make is same as the current app.....hence more user friendly

get it?

Mar 27 '18 18:03 aaronpp65

Oh, I see, when I point the camera towards a container ship it will show me other pictures of container ships, am I understanding correctly? Please not that most user don't use our app's camera, using their favorite camera app instead and then sharing or selecting from gallery.

Mar 28 '18 09:03 nicolas-raoul

Yep exactly...

People take a pic using their camera app and upload it only later(correct me if i am wrong),like when they are home or when they have good connectivity.But providing suggestions at that time will not be useful na cause they probably might not be in that location to retake the pic according to the suggestions we provide....

Apr 02 '18 05:04 aaronpp65

screenshot @nicolas-raoul This i what i have in my mind.You can see the suggestions below in small thumbnails.

Apr 08 '18 17:04 aaronpp65

Here is a web-based tool that suggests categories for any image: https://youtu.be/Y9lvXVJCiyc?t=1932 It seems to work quite well, judging from the demo.

Image labelling and category suggester Phab: https://phabricator.wikimedia.org/T155538 (not exactly all of the things this ticket wants). User script that finds labels for the image and suggests categories I will use the provided laptop Niharika Demo a user script that uses a Google (?) image recoginition API to detect contents of an image and suggest possible categories. Works, but not perfect. (hilarious skeleton example) you can play with it yourself https://commons.wikimedia.org/wiki/User:NKohli_(WMF)/sandbox - https://commons.wikimedia.org/wiki/User:NKohli_(WMF)/imagery.js

If I understand correctly the wiki page calls a mediawiki API which in turn calls a third-party image recognition tool. Having mediawiki in the middle allows the IP address of the user to not be leaked, so I guess we could actually use this right now.

May 21 '18 02:05 nicolas-raoul

https://commons.wikimedia.org/wiki/User:NKohli_(WMF)/sandbox - https://commons.wikimedia.org/wiki/User:NKohli_(WMF)/imagery.js

It looks like this uses a Toolforge tool (https://tools.wmflabs.org/imagery/api.php) which is currently down(?) - it returns a 500 error on a query from the script for me. It's been a long time, I believe it was meant to be a proof of concept that was not going to be maintained as it was.

Jan 08 '19 11:01 whym

it was meant to be a proof of concept

I hope the source code is still available somewhere and someone turns it into a more permanent tool :-)

Jan 08 '19 11:01 nicolas-raoul

My understanding is that we still need to find either:

A Wikimedia-hosted API that provides image classification (We can not use third party API like Azure directly for privacy reasons. Calling a third-party API from a Wikimedia server would be OK as long as it does not cost money).
An embeddable image classification JAR which is small (at most a few megabytes) and open source.

The API or library must output either Commons category(ies) (example: "the submitted image contains a https://commons.wikimedia.org/wiki/Category:Dogs") or Wikipedia/Wikidata item(s) (example: "the submitted image contains a https://www.wikidata.org/wiki/Q144").

Mar 20 '19 10:03 nicolas-raoul

@nicolas-raoul I agree that using third party API such as Azure will be a concern for privacy. There is an alternative to it https://wadehuang36.github.io/2017/07/20/offline-image-classifier-on-android.html

Mar 20 '19 11:03 madhurgupta10