CaptchaSolver icon indicating copy to clipboard operation
CaptchaSolver copied to clipboard

Alternative drop in replacement YOLO model?

Open pogue opened this issue 4 months ago • 7 comments

TLDR: Here are a bunch of pretrained YOLO models and other software that might help expand the capabilities of CAPTCHA recognition in CaptchaSolver

Hi!

I came across your project looking for alternative CAPTCHA solvers as I was getting sick of all the Jdownloader popups and this was exactly what I was looking for! However, I looked over some of the closed threads and noticed some people mentioning it wasn't super great at solving some of the newer captchas because of the older YOLOv2 model used.

I don't know how to train a model, nor do I have anywhere near the adequate hardware to do so. But, I thought surely there are some already made newer models for OCR & CAPTCHA solving and I found a few. I have no idea how to test these out in the real world and since I'm on Windows, it makes running Python scripts and other stuff a bit overly complicated.

So, here are some pre-trained models for OCR/CAPTCHA solving, but I don't know if it would be possible to just drop these into your project and start using them or test them out. But, I wanted to share what I found in hopes it might help.

  • AI-CAPTCHA-Solver - This project claims it will solve up to 90% of CAPTCHAs, but since it says it's for "security testing" (of course) it doesn't mention which ones it's good at. But, it includes a pre-trained YOLO dataset in the files. It's been updated recently (7 months ago) so it could be promising.
  • Captcha recognition with Deep learning - This is an older model from around 3 years ago, but still could be another helpful dataset to use. It says it's based on yolo9000.
  • YOLO CAPTCHA - A 5 year old project claiming 89% accuracy
  • KCAPTCHA-Solver - This model uses YOLOv8 (nice) to solve a Russian based CAPTCHA called KCAPTCHA. I don't know if any of the common file hosting sites use this Captcha, but it might be worth just glancing over to see if it might have some use. The open YOLOv8 model they use is only 4 months old and hosted on HuggingFace.
  • Softy-lines/Captcha-Digit-Detector - This is another YOLOv8 open model that describes itself as a Model Card for Pixelated Captcha Digit Detection. The user also includes their datasets used to train the models, and it looks pretty decent from just glancing at it. Softy-lines/Raw-Pixel-Digit-Captcha & Softy-lines/Pixel-Digit-Captcha-Data
  • secemp9/cloudflare_captcha - Cloudflare CAPTCHAs are particularly annoying, so anything that can bypass these would be great. So many file hosting sites are putting their websites behind Cloudflare, making Jdownload fail the download. This one is based on YOLOv4.

There are some other non-yolo based OCR/CAPTCHA models, but I don't know how easy it would be to adapt these into CaptchaSolver.

  • keras-io/ocr-for-captcha - Wikipedia says keras.io is "an open-source library that provides a Python interface for artificial neural networks." They have an OCR model trained to recognize CAPTCHAs, and it could be promising. From reading it, it sounds like it uses the same technology as Yolo, but I don't know enough about it to know for sure. Documentation: OCR model for reading Captchas: https://keras.io/examples/vision/captcha_ocr/ - They also have an example usage of this model on their Github written in Python: https://github.com/keras-team/keras-io/blob/master/examples/vision/captcha_ocr.py
  • Captcha-Recognizer - This project is for automatically solving those CAPTCHAs where you have to slide a puzzle piece around. These are some of the easiest to do, but obviously not having to look at Jdownloader while it's downloading in the background would be helpful (if any file hosting sites even use these types of captchas)

There is also a lot of general OCR projects out there for just for reading text on the screen. Tesseract is a well known project that is used for transcribing historical documents and can be used on Windows PCs as plugins for software like Irfanview just to scan images and quickly output the text. How well it would work on Captchas I don't know. But, it's free, open source, maintained regularly by large universities and corporations (Wikipedia article on Tesseract). There are other models like Mistral OCR and Qwen that excel at OCR. I don't know how well it would work if you signed up for Mistral or sent a request to them via HTTP or their API on their website or Huggingface and it took a snapshot of your CAPTCHA, sent it to them and waited for a response. Could be a interesting experiment.

OCRBench is a comprehensive evaluation benchmark designed to assess the OCR capabilities of Large Multimodal Models

One alternate method I've been using lately is a browser extension Buster, it's a solver for reCAPTCHA. It works by switching to the audio method of solving the captcha. It sends the data to wit.ai, a platform from Meta for audio recognition, listens, fills out the form and submits. All you do is check the first box on the reCAPTCHA (I don't know why the extension can't do this part, tbh), click the Buster icon, and let go of your mouse and it listens, types what it hears, and hits submit. It really works great! I don't know if any of these captchas on file hosting sites offer this method for visually impaired users.

Anyway, if you search Huggingface for "captcha" there are loads of different results, but very few include documentation or have models that solve completely different captcha systems. Github has a lot of different projects but they're very old, unfortunately.

So I hope @cracker0dks can take a look and see if any of this would be useful for expanding or enhancing the project. Keep up the great work! I hope so of this is helpfup. Thanks, pogue

pogue avatar Jul 27 '25 17:07 pogue

Hey Progue, thanks for you thread and the work you put into creating this list. The thing with most of the projects you posted is, that they may work, and work perfectly, but if we have no host with this captcha type, the solver is just useless.

Most hoster are using reCAPTCHA or hcaptcha and for both captchas you need a browser to solve. The browser its part of the captcha process: browser attributes like version, resolution, mouse movement (you wrote about that you have to click manually... thats why), cookies and things they do not disclose. Thats why I cant support this captchas. You need a browser extention for that.

I thought about switching the yolo version as well, but the accuracy is about 90% and speed is not a problem... so why change a running system? 🤔

TLDR; and conclusion: All the captchas (from big hosts) Im able to support, I do support. If you have any hosts (without reCAPTCHA or hcaptcha) let me know and I will have a look.

Thanks again and greetings :) cracker0dks

cracker0dks avatar Jul 28 '25 13:07 cracker0dks

I thought about switching the yolo version as well, but the accuracy is about 90% and speed is not a problem... so why change a running system? 🤔

When I first came across this project, I glanced at the issues to see if it was up to date and still working and noticed the post by @GaMiR9195 saying he wasn't getting good results with the current models. I installed it and tried it myself and CaptchaSolver was automatically solving the k2cc and filejoker. But now, filejoker captchas are giving me an error every time it tries to download. Those are the ones where you have to match different shapes.

What happens when CaptchaSolver gets the captcha wrong? Does it send it to the user to fill out instead or is it not aware if it fails the response?

I'm not familiar with coding, so I wouldn't know what I was doing, but would it even be possible to just drop in a different YOLO build into CaptchaSolver?

I see in the /darknet64 folder there is a file named yolov4-tiny-custom_last.weights. If I took another person's model and replaced it with that by just dropping it in there, would it function or would it need modification of the code in CaptchaSolver to recognize it?

Most hoster are using reCAPTCHA or hcaptcha and for both captchas you need a browser to solve. The browser its part of the captcha process: browser attributes like version, resolution, mouse movement (you wrote about that you have to click manually... thats why), cookies and things they do not disclose. Thats why I cant support this captchas. You need a browser extention for that.

Some of the first information I came across while looking for open YOLO models were two studies showing how researchers were able to use AI and different builds of YOLO models to bypass reCAPTCHA specifically. Unfortunately, as far as I can tell, they don't provide the models/weights they used to do this.

The Buster extension works great for getting through reCAPTCHA. Like you said, it must check mouse movement and other BS. But, if a person was so inclined, they possibly use a scripting language like AutoIt to click the reCAPTCHA checkbox too. I don't know if you could randomize mouse movement or something. hcaptcha is kind of a similar thing where you have to recognize images. I'm sure that could be bypassed with time and effort as well, but those are new discussions entirely.

Just getting past the captchas with numbers/letters/shapes is great for now

TLDR; and conclusion: All the captchas (from big hosts) Im able to support, I do support. If you have any hosts (without reCAPTCHA or hcaptcha) let me know and I will have a look.

Thanks again and greetings :) cracker0dks

I appreciate your reply! Like I said, CaptchaSolver seems to be working quite well. I'm not sure why it all the sudden is failing filejoker. I looked through Jdownloaders plugins/options but I didn't see a way to force it to disable trying to answer certain sites captcha in the GUI. I'm not sure if a newer/different YOLO model would help, but I'm kind of curious to just try it, tbh 😉

Thanks again! pogue

EDIT: While searching for reCAPTCHA/AI solving stuff, I came across a service that uses a browser extension and AI to solve reCAPTCHA for you. They charge $1 per 1000 solves of reCAPTCHA v3 😮

Solving reCAPTCHA with AI Recognition in 2025

pogue avatar Jul 29 '25 20:07 pogue

But now, filejoker captchas are giving me an error every time it tries to download. Those are the ones where you have to match different shapes.

Oh, I was not aware of that. For me it was working fine the last times I tried it, but that was a while ago. But note that this solver is not using any neuronal network but just image processing: https://github.com/cracker0dks/CaptchaSolver/blob/master/docs/howToSolveGeoCaptchasWalkthrough.md So updateing the YOLO Version will not solve this issue.

We use a NN only for this type of captcha: Image A newer yolo version would only bump up the success rate from 90% to 95% or so.

What happens when CaptchaSolver gets the captcha wrong? Does it send it to the user to fill out instead or is it not aware if it fails the response?

No, sites do not accepte a second guess. So if we send a wrong captcha we will get a new one to solve. The captcha solver does not get any feedback. It just tries to solve a new captcha. JD2 is programmed to switch to manual solving if the captcha solver does not solve it in a few tries (I dont know the exact number).

I'm not familiar with coding, so I wouldn't know what I was doing, but would it even be possible to just drop in a different YOLO build into CaptchaSolver?

That should work, but you will recognize the things the net has leard to recognize. So maye animals or diffrent other things :D so that would be useless. In short: You need to take a "blank" weight files, train it with data you want to recognize and than drop it in. A short part about I did it is here: https://github.com/cracker0dks/CaptchaSolver/blob/master/docs/howToSolveNew6DigitCaptchasWalkthrough.md#next-level-and-my-solution-train-a-neuronal-network

I don't know if you could randomize mouse movement or something. hcaptcha is kind of a similar thing where you have to recognize images. I'm sure that could be bypassed with time and effort as well, but those are new discussions entirely.

Exactly, thats nothing I can implement... sadly

EDIT: While searching for reCAPTCHA/AI solving stuff, I came across a service that uses a browser extension and AI to solve reCAPTCHA for you. They charge $1 per 1000 solves of reCAPTCHA v3

I would not pay for that, but they use often humans for that so its cheap as far as i can tell^^

cracker0dks avatar Jul 30 '25 13:07 cracker0dks

@pogue Check my stupid github repo, i made actuall 16GB chat model with best image "viewer" solve requests with prompts, so you can solve any captchas, and it works without any training better than this yolo v4, im not a coder so i used AI to create it, but it still works very good, if you need i got pretty same thing but using open router with same model. (openrouter got very strong limits to free models, so you need to buy it)

https://github.com/GaMiR9195/AI-based-local-captcha-solver

And dm me in discord -> gam1r

GaMiR9195 avatar Jul 30 '25 15:07 GaMiR9195

But now, filejoker captchas are giving me an error every time it tries to download. Those are the ones where you have to match different shapes.

Oh, I was not aware of that. For me it was working fine the last times I tried it, but that was a while ago. But note that this solver is not using any neuronal network but just image processing: https://github.com/cracker0dks/CaptchaSolver/blob/master/docs/howToSolveGeoCaptchasWalkthrough.md So updateing the YOLO Version will not solve this issue.

It looks like filejoker updated their layout and after a Jdownloader update, the CaptchaSolver is working again!

That should work, but you will recognize the things the net has leard to recognize. So maye animals or diffrent other things :D so that would be useless. In short: You need to take a "blank" weight files, train it with data you want to recognize and than drop it in. A short part about I did it is here: https://github.com/cracker0dks/CaptchaSolver/blob/master/docs/howToSolveNew6DigitCaptchasWalkthrough.md#next-level-and-my-solution-train-a-neuronal-network

I don't know if you could randomize mouse movement or something. hcaptcha is kind of a similar thing where you have to recognize images. I'm sure that could be bypassed with time and effort as well, but those are new discussions entirely.

Why would I need to train an entirely new model as opposed to trying existing ones that have already been trained on letter/number combination CAPTCHAs?

Tbh, CaptchaSolver's YOLOv2 seems to work just fine for those types of CAPTCHAs, but I was just curious if someone just took one of the ones I listed above and put it in that folder, renamed it yolov4-tiny-custom_last.weights so you wouldn't have to change any of the programming, and just sat back and see what happens. Neither CaptchaSolver or Jdownloader gives you much visual feedback of what's going on in the background, so you wouldn't get much info about if it's working any better or worse than what it's already running on.

EDIT: While searching for reCAPTCHA/AI solving stuff, I came across a service that uses a browser extension and AI to solve reCAPTCHA for you. They charge $1 per 1000 solves of reCAPTCHA v3

I would not pay for that, but they use often humans for that so its cheap as far as i can tell^^

It's probably for spammers or people running bot farms to sit back and bypass captchas all day. But, interesting if people are actually making money off it. I wonder if it's actually profitable or costs more to run the AI model they're using on the cloud or locally.

pogue avatar Jul 31 '25 16:07 pogue

@pogue Check my stupid github repo, i made actuall 16GB chat model with best image "viewer" solve requests with prompts, so you can solve any captchas, and it works without any training better than this yolo v4, im not a coder so i used AI to create it, but it still works very good, if you need i got pretty same thing but using open router with same model. (openrouter got very strong limits to free models, so you need to buy it)

https://github.com/GaMiR9195/AI-based-local-captcha-solver

And dm me in discord -> gam1r

That's quite interesting. How does it work though? Does it recognize when a message pops up in the browser window and uses Qwen to try to fill out a response?

I see it needs Nvidia CUDA so I assume it's meant to run locally on your machine? I have a crappy old laptop that can barely run a web browser, so it's nothing I could do. I saw an article the other day about a Russian infostealer bot that used Qwen 2.5-Coder-32B-Instruct through Hugging Face's public API to try and bypass antiviruses. So, if it's not hitting the API limit from Hugging Face, I guess you could just use their's instead? 😉

LameHug malware uses AI LLM to craft Windows data-theft commands in real-time - BleepingComputer

pogue avatar Jul 31 '25 16:07 pogue

Why would I need to train an entirely new model as opposed to trying existing ones that have already been trained on letter/number combination CAPTCHAs?

The more captcha/types your Model is supporting, the larger it gets and more inefficient. So the best way is to train a small net, only with data it will actually see.

I was just curious if someone just took one of the ones I listed above and put it in that folder, renamed it yolov4-tiny-custom_last.weights

you could just do that and run the "keep2share.cc.bat" and take a look at the log.txt :)

I tested against Qwen3-Coder (even its not practical to use a large LLM localy) but it was not correct even once. For both captcha types it was close at best.

cracker0dks avatar Aug 01 '25 10:08 cracker0dks