shimmie2 icon indicating copy to clipboard operation
shimmie2 copied to clipboard

auto-tagging system based on MD5 hashes on other booru-like sites?

Open mmd123 opened this issue 3 years ago • 6 comments

so, I find myself wanting a feature, and had asked around on reddit under r/selfhosted, and somebody suggested making a feature request given they also use shimmie2 and would also like to see this as a feature possibly added.

so, other sites out there that a lot of us likely use to browse certain materials exist (gelbooru, e621, chan.sankakucomplex, danbooru, just to name a few) all have publicly available galleries, and it was mentioned that if the site has an API that could be called, that you could use, as per the suggestion of another user, a python script for gallery-dl to have tags from matched images automatically populated into shimmie2, however, I am a horrible programmer, and I could not for the life of me implement this myself, so here we are, now asking about this being a feature request.

ideally, you would be able to specify information on what other sites to try and pull information from, in a configuration page for this particular feature, extension, or otherwise. but yea. what are the odds something like this could be added, either as an official feature, or an extension that can be installed/added only by those that want to add this functionality?

mmd123 avatar Jul 25 '21 19:07 mmd123

Hey there :) I was actually the one that suggested to post a request here.

As a note: this can be done either at upload, as another task like generating thumbnails, or as a prompt action, which is triggered manually.

I think the latter is better because this potentially could take a lot of time to process.

Are the current API extensions allow search by MD5? Another note: someone from Reddit suggested gallery-dl as tag based search.

MetallicAchu avatar Jul 26 '21 12:07 MetallicAchu

Hey there :) I was actually the one that suggested to post a request here.

As a note: this can be done either at upload, as another task like generating thumbnails, or as a prompt action, which is triggered manually.

I think the latter is better because this potentially could take a lot of time to process.

Are the current API extensions allow search by MD5? Another note: someone from Reddit suggested gallery-dl as tag based search.

Thank you again for suggesting to ask this here, I had honestly not even considered to do so until you suggested to.

mmd123 avatar Jul 26 '21 14:07 mmd123

Assuming those other sites have suitable APIs, it's definitely possible; I'm melting from my day-job at the moment but happy to advise and accept pull-requests ^^;;

Other thoughts -

  • it should be relatively easy to support both "at upload" and "manual click" approaches
  • if other sites use other hashes (eg sha1) we can hash the files on disk as-needed, it's not that expensive (unless it's being done in bulk)

shish avatar Aug 05 '21 08:08 shish

Assuming those other sites have suitable APIs, it's definitely possible; I'm melting from my day-job at the moment but happy to advise and accept pull-requests ^^;;

Other thoughts -

  • it should be relatively easy to support both "at upload" and "manual click" approaches
  • if other sites use other hashes (eg sha1) we can hash the files on disk as-needed, it's not that expensive (unless it's being done in bulk)

given I have no idea how to do this, do you have any suggestions on how to try and implement this if I were to try and take a stab at making this myself?? I so far have not found any friends of mine that I can contact for help, on javascript or Php, so any insight or suggestions on what to try and do to implement this would be highly appreciated, especially so if you are able to suggest anything for reading on how to try and make an extension for this feature, ideally I'd like to set this up as an installable extension for shimmie, but so far, my attempts at googling anything for material to look into has been utterly fruitless. I'm likely going to have to teach myself both PHP and some javascript to get this done, but since so far, over months of asking, nobody else I know is able or willing to help, so may as well do it myself.

mmd123 avatar Oct 16 '21 22:10 mmd123

If I were to do something like this I'd probably try to use hydrus (hydrus-network) (with the PTR) and some booru software that uses its client api. Of which there are at least a couple

dali99 avatar Oct 17 '21 01:10 dali99

If I were to do something like this I'd probably try to use hydrus (hydrus-network) (with the PTR) and some booru software that uses its client api. Of which there are at least a couple

honestly, I was going to try and write some kind of javascript library was my thinking, just was not sure of a couple things, one and most notably being how to hook it into shimmie in such a way that it could be called from the admin backend, like, when your looking at shimmie's interface, on the backend of my specific installation it has the following toolbars at the top: board config, system info, source changes, tag changes, board admin, extension manager, user list, chron upload, and tips editor, given the extensions I have installed and configured... my original thought process was to hook into that somehow, and then have a specific page for the tag searching by image, somehow, but I cant pre-plan to save my life, however given this runs on normal web server software, my thought was thus to use javascript and php, or maybe just regular HTML, so that its cross webserver compatible...

I have used hydrus in the past, and decided I did not like it for my own usage, thus why I ended up opting to use shimmie, I had demo'd basically every single damn thing that existed to be totally honest, gelbooru source, danbooru (opted not to try it given it had to run by way of ruby on rails, and to date I have never messed with ruby apps, gems or otherwise, so if I broke something, I'd be left unable to figure out how to fix it myself), and even other options, nothing even came close to what I wanted, which shimmi has come the closest to so far.

mmd123 avatar Oct 17 '21 03:10 mmd123

hello! @mmd123 asked me if I could help him implement this so I'll be working on it soon (although I haven't written PHP in like.. years xD). Also, I'm gonna make it into an actual plugin and write a PR for it so others can use it as well ;)

I do have a few questions though, @shish:

  1. ~~Does Shimmie2 currently calculate and store image hashes on upload? If not, is there an API method which I can calculate it with?~~ Figured it out
  2. ~~I briefly inspected the plugin code for editing tags but I'm unsure of how I would trigger tag editing from another plugin/extension. Could you please point me to the right direction?~~ Figured it out too
  3. ~~What do you think would be the best method for developing the plugin and testing it inside a Docker container without having to rebuild it all the time? Or is that the only way? I guess I can mount a volume pointing at the extension dir but I'm not sure..~~ Just rebuilding it is fast enough due to the caching

Thanks!

HeCorr avatar Nov 09 '21 15:11 HeCorr

Updating my second question:

I just checked the code again and I think I just need to construct an Image class with it's ID and call $image->set_tags(). Apparently I just need the image ID for it to work but the question is, can I construct the class with just the image's ID? And is there an easier method of referencing an image by it's ID without constructing a class manually?

Also, another question: ~~Which event should I listen to for receiving image uploads?~~ Got it ;)

HeCorr avatar Nov 09 '21 15:11 HeCorr

$image = Image::by_id($image_id);

sanmadjack avatar Nov 09 '21 17:11 sanmadjack

$image = Image::by_id($image_id);

I'm sure I would have found that if I had looked harder. Thanks! xD

HeCorr avatar Nov 09 '21 17:11 HeCorr

Now I'm having some issues getting anything to log to the console..

image

This prints absolutely nothing when uploading files. And yes, I did enable the extension:

image

Do I have to enable logging somewhere..?

HeCorr avatar Nov 09 '21 23:11 HeCorr

Where are you checking for the log?

On Tue, Nov 9, 2021 at 5:07 PM Henrique Corrêa @.***> wrote:

Now I'm having some issues getting anything to log to the console..

[image: image] https://user-images.githubusercontent.com/75134774/141019748-460ab769-0e68-4e2d-bb9a-104fce4a1f43.png

This prints absolutely nothing when uploading files. And yes, I did enable the extension:

[image: image] https://user-images.githubusercontent.com/75134774/141019830-706bb294-7fbe-41b5-a2a3-682ef3a20af4.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/shish/shimmie2/issues/842#issuecomment-964626068, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKSKPJ77VVUSVSV4FZOAMDULGSSVANCNFSM5A62RJFQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

sanmadjack avatar Nov 09 '21 23:11 sanmadjack

Where are you checking for the log?

In the Docker container logs:

image

I did notice the Logging (Network) extension but it doesn't seem to work at all with netcat..

HeCorr avatar Nov 09 '21 23:11 HeCorr

I figured it out. Since I'm running Shimmie inside Docker, 127.0.0.1 maps to the container, not the host (where I was running netcat to receive logs).

I'll have to modify the extensions's code though, since I have no idea of how to set configs.

image

After a lot of frustration and research, I got netcat to stop dying after the first packet (which seems to be a bug). The command that worked for me was nc -kul 35353.

Btw @shish the port 35353 isn't documented anywhere.. if I didn't inspect the code myself I would have 0 idea of how to use the extension. :(

HeCorr avatar Nov 11 '21 18:11 HeCorr