google-maps-scraper icon indicating copy to clipboard operation
google-maps-scraper copied to clipboard

High memory usage when scraping lot of queries

Open admbyz opened this issue 1 year ago • 7 comments
trafficstars

Hi, Big thanks for the tool. It is wonderful and do the job well but i have a problem about memory usage.

I am using scraper with ./google-maps-scraper -input keyword.txt -results results.csv -exit-on-inactivity 3m -c 1 -depth 14

keyword.txt have like ~400-500 queries after running 4hr+ basically program consumes all of the ram (16gb) available then crashes.
Both tested windows 10 and xubuntu 22.04 results are same except in linux it is run bit longer due being lightweight OS.

I just made simple script that divide keywords multiple files and then run them synchronously one at a time trough bash and sleeps like 30sec between commands for a workaround.

So the question is memory consumption about memory leak or is it totally normal for that query size?

admbyz avatar Mar 07 '24 14:03 admbyz

Hi @admbyz, I have the same problem with 10 cores and 32GB RAM, I was doing 10 queries at the time and run it manually again and again with new queries. Would you mind sharing the bash file? Thanks.

lexciobotariu avatar Mar 09 '24 11:03 lexciobotariu

@admbyz which go version you use?

gosom avatar Mar 09 '24 11:03 gosom

Do you have the same issue when you run using a docker container?

gosom avatar Mar 09 '24 11:03 gosom

@lexciobotariu i am running 30 queries per instance with 1 core (-c 1) this should solve your problem if speed is not important. This is the script that i'm using which search "keywords.txt" and generates keywords01.txt to keywordsn.txt for every 30 queries and finally create run.sh for running scraper synchronously.

make a build.sh and copy the code below and make the file executable (chmod +x build.sh) and run it (./build.sh or bash build.sh) then run the created run.sh file.

#!/bin/bash

create_keywords_file() {
    local file_prefix="$1"
    local start_line=$2
    local end_line=$3

    sed -n "${start_line},${end_line}p" keywords.txt > "${file_prefix}.txt"
    echo "./google-maps-scraper -input \"${file_prefix}.txt\" -results \"${file_prefix}.csv\" -exit-on-inactivity 3m -c 1 -depth 14 &" >> run.sh
    echo "${file_prefix}=\$!" >> run.sh
    echo "wait \$${file_prefix}" >> run.sh
}

main() {
    if [ ! -f keywords.txt ]; then
        echo "Error: keywords.txt not found!"
        exit 1
    fi

    total_lines=$(wc -l < keywords.txt)
    lines_per_file=30
    num_files=$((total_lines / lines_per_file))
    remainder=$((total_lines % lines_per_file))

    for ((i=1; i<=num_files; i++)); do
        start_line=$(( (i - 1) * lines_per_file + 1 ))
        end_line=$(( i * lines_per_file ))
        file_prefix="keywords$(printf "%02d" $i)"
        create_keywords_file "$file_prefix" "$start_line" "$end_line"
        echo "sleep 30" >> run.sh
    done

    if [ "$remainder" -gt 0 ]; then
        start_line=$((num_files * lines_per_file + 1))
        end_line=$((total_lines))
        file_prefix="keywords$(printf "%02d" $((num_files + 1)))"
        create_keywords_file "$file_prefix" "$start_line" "$end_line"
        echo "sleep 30" >> run.sh
    fi

    chmod +x run.sh
    echo "run.sh file created successfully!"
}

main

@gosom i am using latest golang on both platform 1.22.1

Didn't used docker and i am off for 3 days after that i will test and comment here again.

admbyz avatar Mar 10 '24 23:03 admbyz

@gosom any updates, my 32 GB ram aren't enough?

I'm using go 1.22.1 on Ubuntu 22

this should be enough to go way beyond 32GB:

Friseur München, deutschland Restaurant München, deutschland Dolmetscher München, deutschland Tischler München, deutschland Maler München, deutschland Sanitär Installateur München, deutschland Heizungsbauer München, deutschland Schlosser München, deutschland Elektriker München, deutschland Fliesenleger München, deutschland Zimmermann München, deutschland Glaser München, deutschland Dachdecker München, deutschland Maurer München, deutschland Metallbauer München, deutschland Steinmetz München, deutschland Schreiner München, deutschland Installateur für Heizung, Lüftung und Sanitär München, deutschland Bodenleger München, deutschland Stuckateur München, deutschland Kaminbauer München, deutschland Ofenbauer München, deutschland Parkettleger München, deutschland Raumausstatter München, deutschland Bautischler München, deutschland Restaurator München, deutschland Bootsbauer München, deutschland Uhrmacher München, deutschland Goldschmied München, deutschland Silberschmied München, deutschland Graveur München, deutschland Uhrmacher München, deutschland Modellbauer München, deutschland Drechsler München, deutschland Holzbildhauer München, deutschland Kunstschmied München, deutschland Sattler München, deutschland Tapezierer München, deutschland Polsterer München, deutschland Schuhmacher München, deutschland Immobilienmakler München, deutschland Reisebüro München, deutschland Blumenladen München, deutschland Buchhandlung München, deutschland Autowerkstatt München, deutschland Elektronikgeschäft München, deutschland Schuhgeschäft München, deutschland Optiker München, deutschland Fahrradladen München, deutschland Goldschmied München, deutschland Juwelier München, deutschland Tattoostudio München, deutschland Fotostudio München, deutschland Hochzeitsphotograph München, deutschland Rechtsanwaltskanzlei München, deutschland Steuerberater München, deutschland Architekturbüro München, deutschland Innenarchitekt München, deutschland Restaurant München, deutschland Café München, deutschland

OSZII avatar Mar 14 '24 19:03 OSZII

#7 I have attached a memory graph in this issue, can help to check it? It seems like playwright didn't close correctly

arceushui avatar Mar 15 '24 08:03 arceushui

@gosom I tried docker from win10 pc with -c 1 -depth 14 but memory usage was much more because of wsl and got error much faster because docker + wsl started with like 8gb ram usage.

admbyz avatar Mar 18 '24 07:03 admbyz

The latest release (v1.2.1) has perfomance enhancements and memory usage looks to be stable.

@admbyz can you try this one ?

PS. I have only tested in fedora linux but looks good

gosom avatar May 01 '24 08:05 gosom

Sure i will try without splitting my keywords and let you know. Thanks for the bump.

PS : @gosom I tried with -c 1 for a while and it seems problem is gone because now it ends running instances correctly didnt saw any instance more than 250mb so i ended that session and now running with -c 8 with 973 keywords. I'll update post when its done.

Problem is gone tested on xubuntu 22.04 with latest updates and compiled scraper with go 1.22.2. 973 keywords with -c 8 depth 14. Scraper finished its job successfully. No memory leaks and collected 38mb worth of data.

goodjob @gosom thank you for the effort!

admbyz avatar May 01 '24 10:05 admbyz

@gosom Works fine for me too! Thanks Mate!

OSZII avatar May 02 '24 07:05 OSZII