google-search-results-python icon indicating copy to clipboard operation
google-search-results-python copied to clipboard

[Google Jobs API] Support for Pagination

Open aliayar opened this issue 3 years ago • 1 comments

As Google Jobs does not return serpapi_pagination key but expects start param to paginate, this iteration of the library does not support pagination in Google Jobs. Pagination Support to be added for Google Jobs.

# stop if backend miss to return serpapi_pagination
if not 'serpapi_pagination' in result:
  raise StopIteration

# stop if no next page
if not 'next' in result['serpapi_pagination']:
    raise StopIteration

image

aliayar avatar May 27 '22 12:05 aliayar

@aliayar A workaround would be to use "start" parameter in combination with "error" hash key.

"start" parameter will be passed to the GoogleSearch() and should be set to 0 -> int:

params = {
    "api_key": "...",
    "engine": "google_jobs",
    "q": "Barista",
    "start": 0 👈👈👈👈
}

From there, we need to use while loop to iterate over all pages and utilize the"error" hash key to exit out of the while loop:

while True:
    results = search.get_dict()

    if "error" in results:
        break

image

To paginate, we need to increment a 10 to "start" at the end of the loop.

while True:
    results = search.get_dict()

    if "error" in results:
        print(results["error"])
        break

    params["start"] += 10

Full example:

from serpapi import GoogleSearch
import json

params = {
    "api_key": "...",
    "engine": "google_jobs",
    "google_domain": "google.com",
    "q": "Barista",
    "start": 0
}

search = GoogleSearch(params)

jobs = []

# to show page number
page_num = 0

while True:
    results = search.get_dict()

    if "error" in results:
        print(results["error"])
        break

    page_num += 1
    print(f"Current page: {page_num}")

    # iterate over organic results and extract the data
    for result in results["jobs_results"]:
        jobs.append({
            "title": result["title"],
            "company_name": result["company_name"],
            "location": result["location"]
        })

    params["start"] += 10

print(json.dumps(jobs, indent=2))

Prints:

Current page: 1
Current page: 2
Current page: 3
Current page: 4
Current page: 5
Current page: 6
Current page: 7
Current page: 8
Current page: 9
Current page: 10
...
Current page: 36
Google hasn't returned any results for this query.

Part of the JSON output:

[
   {
      "title":"Barista",
      "company_name":"Amazonia Cafe",
      "location":"Seattle, WA"
   },
   {
      "title":"barista - Store# 62187, 183 & ESTERS",
      "company_name":"Starbucks Coffee Company",
      "location":"Irving, TX"
   },
   {
      "title":"barista - Store# 48248, ELKHORN & RIO LINDA",
      "company_name":"Starbucks Coffee Company",
      "location":"Rio Linda, CA"
   }
]

dimitryzub avatar Jul 11 '22 09:07 dimitryzub