metagoofil cant find any files
Hi all
i build the container and run it using
sudo docker run -v $PWD/data:/data metagoofil -d github.com -f -n 10 -r 4 -t pdf -w
[] Downloaded files will be saved here: /data [] Searching for 100 .pdf files and waiting 30.0 seconds between searches [+] Total download: 0 bytes / 0.00 KB / 0.00 MB [+] Done!
no files are found looks unrealistic
I tried multiple domains . Is it possible to increase loging ?
Thanks Jaoh
Hi @jaoh - Thanks for bringing this to my attention. I got 0 results as well when using the container and the Python virtual environment. I also tried a couple of domains. This code is showing it's age and only has print statements instead of proper logging levels...so what you see is what you get.
I suspect the underlying google library may be the issue or Google is changing how results are returned (which has happened in the past). When I set a breakpoint here, there are no results. I also wrote a google search library called yagooglesearch that I've wanted to incorporate, but not had the time.
Unfortunately, making that migration and maintaining this repo are not at the top of my TODO list. I'd be more than happy to review a PR using the yagooglesearch library if you have the skillset.
i could narrow it down to the user agent and created a pr
EDITED - Oops, didn't see https://github.com/opsdisk/metagoofil/pull/40 before posting this.
Hi @jaoh
- Try with the
-uswitch
python metagoofil.py -d github.com -f -n 10 -r 4 -t pdf -w -u
- I created this PR to swap out the Google libraries being used.
You should be able to test it out using these commands. Mind taking it for a spin, but using the Python virtual environment instead? The Docker container is giving me some issues.
# Fetch the latest branch
git checkout master
git pull origin master
git fetch
git checkout v2-using-yagooglesearch
# Delete the old Python virtual environment and create a new one
rm -rf .venv
virtualenv -p python3 .venv # If using a virtual environment.
source .venv/bin/activate # If using a virtual environment.
pip install -r requirements.txt
# Run metagoofil
python metagoofil.py -d github.com -f -n 10 -r 4 -t pdf -w
@jaoh Feel free to disregard the above. I made a smaller fix and merged it into v.1.3.0.
# Fetch the latest branch
git checkout master
git pull origin master
Hi! i was trying metagoofil with the standard googlesearch library as well as yagooglsearch, however nothing seems to work right now. maybe google changed stuff again?
I checked the request URL and there are pdfs for the request:
$ metagoofil -d bvg.de -t pdf
[*] Searching for 100 .pdf files and waiting 30.0 seconds between searches
2025-10-07 15:47:27,784 [MainThread ] [INFO] Requesting URL: https://www.google.com/
2025-10-07 15:47:27,969 [MainThread ] [DEBUG] status_code: 200
2025-10-07 15:47:27,969 [MainThread ] [DEBUG] headers: {'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_6; en-gb) AppleWebKit/528.10+ (KHTML, like Gecko) Version/4.0dp1 Safari/526.11.2'}
2025-10-07 15:47:27,970 [MainThread ] [DEBUG] cookies: <RequestsCookieJar[<Cookie SOCS=CAAaBgiAsJHHBg for .google.com/>, <Cookie AEC=AaJma5vZgl9SiUCjzSHhEaWByA2b9KgGcoWWlNxAXPFcqCI4B_hDD2xSnoc for .google.com/>, <Cookie __Secure-ENID=28.SE=JxgoU0lPaNRP2EI2q-cJzcXhoXeCnQF7-GjN6QYZSFo6ixD8AgTLeGupsSHFyXG71r8ckbE-bRM9m_DSnYQQwyBNoVQHceTy4vG0x4RobwYsWckaNrIMikyKIBOk94SBFXYT1jHFEF6RJwAkTrECd3CzxQXzD1SqwxKxixtaUz1uzmiykqo8SQuPm6CzlM33bXsW5TNYw9bVh5vtpjMVNo0W8g for .google.com/>]>
2025-10-07 15:47:27,970 [MainThread ] [DEBUG] proxy:
2025-10-07 15:47:27,970 [MainThread ] [DEBUG] verify_ssl: True
2025-10-07 15:47:27,975 [MainThread ] [INFO] Stats: start=0, num=100, total_valid_links_found=0 / max_search_result_urls_to_return=100
2025-10-07 15:47:27,977 [MainThread ] [INFO] Requesting URL: https://www.google.com/search?hl=en&lr=lang_en&q=filetype%3Apdf+site%3Abvg.de&num=100&btnG=Google+Search&tbs=li:1&safe=off&cr=&filter=0
2025-10-07 15:47:28,184 [MainThread ] [DEBUG] status_code: 200
2025-10-07 15:47:28,184 [MainThread ] [DEBUG] headers: {'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_6; en-gb) AppleWebKit/528.10+ (KHTML, like Gecko) Version/4.0dp1 Safari/526.11.2'}
2025-10-07 15:47:28,185 [MainThread ] [DEBUG] cookies: <RequestsCookieJar[]>
2025-10-07 15:47:28,185 [MainThread ] [DEBUG] proxy:
2025-10-07 15:47:28,185 [MainThread ] [DEBUG] verify_ssl: True
2025-10-07 15:47:28,215 [MainThread ] [DEBUG] pre filter_search_result_urls() link: https://accounts.google.com/ServiceLogin?hl=en&continue=https://www.google.com/search?hl%3Den%26lr%3Dlang_en%26q%3Dfiletype%253Apdf%2Bsite%253Abvg.de%26num%3D100%26btnG%3DGoogle%2BSearch%26tbs%3Dli:1%26safe%3Doff%26cr%3D%26filter%3D0&gae=cb-none
2025-10-07 15:47:28,215 [MainThread ] [DEBUG] Excluding URL because it contains "google": https://accounts.google.com/ServiceLogin?hl=en&continue=https://www.google.com/search?hl%3Den%26lr%3Dlang_en%26q%3Dfiletype%253Apdf%2Bsite%253Abvg.de%26num%3D100%26btnG%3DGoogle%2BSearch%26tbs%3Dli:1%26safe%3Doff%26cr%3D%26filter%3D0&gae=cb-none
2025-10-07 15:47:28,216 [MainThread ] [DEBUG] post filter_search_result_urls() link: None
2025-10-07 15:47:28,216 [MainThread ] [DEBUG] pre filter_search_result_urls() link: https://accounts.google.com/ServiceLogin?hl=en&continue=https://www.google.com/search?hl%3Den%26lr%3Dlang_en%26q%3Dfiletype%253Apdf%2Bsite%253Abvg.de%26num%3D100%26btnG%3DGoogle%2BSearch%26tbs%3Dli:1%26safe%3Doff%26cr%3D%26filter%3D0&gae=cb-none
2025-10-07 15:47:28,216 [MainThread ] [DEBUG] Excluding URL because it contains "google": https://accounts.google.com/ServiceLogin?hl=en&continue=https://www.google.com/search?hl%3Den%26lr%3Dlang_en%26q%3Dfiletype%253Apdf%2Bsite%253Abvg.de%26num%3D100%26btnG%3DGoogle%2BSearch%26tbs%3Dli:1%26safe%3Doff%26cr%3D%26filter%3D0&gae=cb-none
2025-10-07 15:47:28,216 [MainThread ] [DEBUG] post filter_search_result_urls() link: None
2025-10-07 15:47:28,216 [MainThread ] [DEBUG] pre filter_search_result_urls() link: https://policies.google.com/technologies/cookies?hl=en&utm_source=ucb
2025-10-07 15:47:28,216 [MainThread ] [DEBUG] Excluding URL because it contains "google": https://policies.google.com/technologies/cookies?hl=en&utm_source=ucb
2025-10-07 15:47:28,216 [MainThread ] [DEBUG] post filter_search_result_urls() link: None
2025-10-07 15:47:28,216 [MainThread ] [DEBUG] pre filter_search_result_urls() link: https://consent.google.com/dl?continue=https://www.google.com/search?hl%3Den%26lr%3Dlang_en%26q%3Dfiletype%253Apdf%2Bsite%253Abvg.de%26num%3D100%26btnG%3DGoogle%2BSearch%26tbs%3Dli:1%26safe%3Doff%26cr%3D%26filter%3D0&gl=DE&hl=en&cm=2&pc=srp&uxe=none&src=1&escs=AZj2P8gh1KuyUugL8jFrkHz3OhPa0EasJc-tr_JgHqjE-4ZZKtBZq0TyOFJZ8oc_ciD5RGdcuTpT5CiWICYrqAduBG48ZSS3eyYl
2025-10-07 15:47:28,216 [MainThread ] [DEBUG] Excluding URL because it contains "google": https://consent.google.com/dl?continue=https://www.google.com/search?hl%3Den%26lr%3Dlang_en%26q%3Dfiletype%253Apdf%2Bsite%253Abvg.de%26num%3D100%26btnG%3DGoogle%2BSearch%26tbs%3Dli:1%26safe%3Doff%26cr%3D%26filter%3D0&gl=DE&hl=en&cm=2&pc=srp&uxe=none&src=1&escs=AZj2P8gh1KuyUugL8jFrkHz3OhPa0EasJc-tr_JgHqjE-4ZZKtBZq0TyOFJZ8oc_ciD5RGdcuTpT5CiWICYrqAduBG48ZSS3eyYl
2025-10-07 15:47:28,216 [MainThread ] [DEBUG] post filter_search_result_urls() link: None
2025-10-07 15:47:28,217 [MainThread ] [DEBUG] pre filter_search_result_urls() link: https://policies.google.com/privacy?hl=en&utm_source=ucb
2025-10-07 15:47:28,217 [MainThread ] [DEBUG] Excluding URL because it contains "google": https://policies.google.com/privacy?hl=en&utm_source=ucb
2025-10-07 15:47:28,217 [MainThread ] [DEBUG] post filter_search_result_urls() link: None
2025-10-07 15:47:28,217 [MainThread ] [DEBUG] pre filter_search_result_urls() link: https://policies.google.com/terms?hl=en&utm_source=ucb
2025-10-07 15:47:28,217 [MainThread ] [DEBUG] Excluding URL because it contains "google": https://policies.google.com/terms?hl=en&utm_source=ucb
2025-10-07 15:47:28,217 [MainThread ] [DEBUG] post filter_search_result_urls() link: None
2025-10-07 15:47:28,217 [MainThread ] [INFO] No valid search results found on this page. Moving on...
[*] Results: 0 .pdf files found
[+] Done!
with the legacy google library:
$ metagoofil -d bvg.de -t pdf
[*] Searching for 100 .pdf files and waiting 30.0 seconds between searches
[*] Results: 0 .pdf files found
[+] Done!
am i missing something?
@makefu Appreciate the additional data point. Yeah, they likely changed something in the HTML returned. I was not able to get any results either. Not sure when I'll be able to dig into it more.
Hi @opsdisk thanks for your response, i really appreciate it! I was trying to package metagoofil for NixOS as part of the initiative to have all the hacker tools just like Kali Linux has and i was just trying out how it even works :)
Due to google being google it may be just out of scope now for us mere humans to have tools which can automatically interact with the website.