google-maps-scraper
google-maps-scraper copied to clipboard
High memory usage when scraping lot of queries
Hi, Big thanks for the tool. It is wonderful and do the job well but i have a problem about memory usage.
I am using scraper with ./google-maps-scraper -input keyword.txt -results results.csv -exit-on-inactivity 3m -c 1 -depth 14
keyword.txt have like ~400-500 queries after running 4hr+ basically program consumes all of the ram (16gb) available then crashes.
Both tested windows 10 and xubuntu 22.04 results are same except in linux it is run bit longer due being lightweight OS.
I just made simple script that divide keywords multiple files and then run them synchronously one at a time trough bash and sleeps like 30sec between commands for a workaround.
So the question is memory consumption about memory leak or is it totally normal for that query size?
Hi @admbyz, I have the same problem with 10 cores and 32GB RAM, I was doing 10 queries at the time and run it manually again and again with new queries. Would you mind sharing the bash file? Thanks.
@admbyz which go version you use?
Do you have the same issue when you run using a docker container?
@lexciobotariu i am running 30 queries per instance with 1 core (-c 1) this should solve your problem if speed is not important. This is the script that i'm using which search "keywords.txt" and generates keywords01.txt to keywordsn.txt for every 30 queries and finally create run.sh for running scraper synchronously.
make a build.sh and copy the code below and make the file executable (chmod +x build.sh) and run it (./build.sh or bash build.sh) then run the created run.sh file.
#!/bin/bash
create_keywords_file() {
local file_prefix="$1"
local start_line=$2
local end_line=$3
sed -n "${start_line},${end_line}p" keywords.txt > "${file_prefix}.txt"
echo "./google-maps-scraper -input \"${file_prefix}.txt\" -results \"${file_prefix}.csv\" -exit-on-inactivity 3m -c 1 -depth 14 &" >> run.sh
echo "${file_prefix}=\$!" >> run.sh
echo "wait \$${file_prefix}" >> run.sh
}
main() {
if [ ! -f keywords.txt ]; then
echo "Error: keywords.txt not found!"
exit 1
fi
total_lines=$(wc -l < keywords.txt)
lines_per_file=30
num_files=$((total_lines / lines_per_file))
remainder=$((total_lines % lines_per_file))
for ((i=1; i<=num_files; i++)); do
start_line=$(( (i - 1) * lines_per_file + 1 ))
end_line=$(( i * lines_per_file ))
file_prefix="keywords$(printf "%02d" $i)"
create_keywords_file "$file_prefix" "$start_line" "$end_line"
echo "sleep 30" >> run.sh
done
if [ "$remainder" -gt 0 ]; then
start_line=$((num_files * lines_per_file + 1))
end_line=$((total_lines))
file_prefix="keywords$(printf "%02d" $((num_files + 1)))"
create_keywords_file "$file_prefix" "$start_line" "$end_line"
echo "sleep 30" >> run.sh
fi
chmod +x run.sh
echo "run.sh file created successfully!"
}
main
@gosom i am using latest golang on both platform 1.22.1
Didn't used docker and i am off for 3 days after that i will test and comment here again.
@gosom any updates, my 32 GB ram aren't enough?
I'm using go 1.22.1 on Ubuntu 22
this should be enough to go way beyond 32GB:
Friseur München, deutschland Restaurant München, deutschland Dolmetscher München, deutschland Tischler München, deutschland Maler München, deutschland Sanitär Installateur München, deutschland Heizungsbauer München, deutschland Schlosser München, deutschland Elektriker München, deutschland Fliesenleger München, deutschland Zimmermann München, deutschland Glaser München, deutschland Dachdecker München, deutschland Maurer München, deutschland Metallbauer München, deutschland Steinmetz München, deutschland Schreiner München, deutschland Installateur für Heizung, Lüftung und Sanitär München, deutschland Bodenleger München, deutschland Stuckateur München, deutschland Kaminbauer München, deutschland Ofenbauer München, deutschland Parkettleger München, deutschland Raumausstatter München, deutschland Bautischler München, deutschland Restaurator München, deutschland Bootsbauer München, deutschland Uhrmacher München, deutschland Goldschmied München, deutschland Silberschmied München, deutschland Graveur München, deutschland Uhrmacher München, deutschland Modellbauer München, deutschland Drechsler München, deutschland Holzbildhauer München, deutschland Kunstschmied München, deutschland Sattler München, deutschland Tapezierer München, deutschland Polsterer München, deutschland Schuhmacher München, deutschland Immobilienmakler München, deutschland Reisebüro München, deutschland Blumenladen München, deutschland Buchhandlung München, deutschland Autowerkstatt München, deutschland Elektronikgeschäft München, deutschland Schuhgeschäft München, deutschland Optiker München, deutschland Fahrradladen München, deutschland Goldschmied München, deutschland Juwelier München, deutschland Tattoostudio München, deutschland Fotostudio München, deutschland Hochzeitsphotograph München, deutschland Rechtsanwaltskanzlei München, deutschland Steuerberater München, deutschland Architekturbüro München, deutschland Innenarchitekt München, deutschland Restaurant München, deutschland Café München, deutschland
#7 I have attached a memory graph in this issue, can help to check it? It seems like playwright didn't close correctly
@gosom I tried docker from win10 pc with -c 1 -depth 14 but memory usage was much more because of wsl and got error much faster because docker + wsl started with like 8gb ram usage.
The latest release (v1.2.1) has perfomance enhancements and memory usage looks to be stable.
@admbyz can you try this one ?
PS. I have only tested in fedora linux but looks good
Sure i will try without splitting my keywords and let you know. Thanks for the bump.
PS : @gosom I tried with -c 1 for a while and it seems problem is gone because now it ends running instances correctly didnt saw any instance more than 250mb so i ended that session and now running with -c 8 with 973 keywords. I'll update post when its done.
Problem is gone tested on xubuntu 22.04 with latest updates and compiled scraper with go 1.22.2. 973 keywords with -c 8 depth 14. Scraper finished its job successfully. No memory leaks and collected 38mb worth of data.
goodjob @gosom thank you for the effort!
@gosom Works fine for me too! Thanks Mate!