GooglePatentsPdfDownloader
GooglePatentsPdfDownloader copied to clipboard
Download patents as PDF documents from Google Patents
Google Patents PDF Downloader
Download patents as PDF documents from Google Patents
Installation
You can install the development version from GitHub with:
pip install git+https://github.com/lorenzbr/GooglePatentsPdfDownloader.git
Please make sure you have Google Chrome and the corresponding chromedriver.exe (see here) installed to access the website using Selenium.
Run GooglePatentsPdfDownloader
python -m GooglePatentsPdfDownloader
patent Patent number(s) to be downloaded
optional arguments:
--driver Path and file name of the Chrome driver exe
--brave Switch application from Google Chrome to Brave.
--output An output path where documents are saved. Default ./pdf
--time Waiting time in seconds for each request.
--rm-kind A list containing the patent kind codes which should be removed from patent numbers
Examples
Download a single patent to the current working directory (not found w/ kind code).
python -m GooglePatentsPdfDownloader US4405829A1 --rm_kind A1
python -m GooglePatentsPdfDownloader EP0551921B1
Download multiple patents using a list of inputs to directory ./patents.
python -m GooglePatentsPdfDownloader US4405829 EP0551921B1 --output "./patents"
With Brave browser download multiple patents using a txt file to director ./pdf.
python -m GooglePatentsPdfDownloader docs/data/patents.txt --brave
Examples (modular)
from GooglePatentsPdfDownloader import PatentDownloader
patent_downloader = PatentDownloader(chrome_driver='chromedriver.exe', brave=True)
# Download a single patent to the current working directory (not found w/ kind code)
patent_downloader.download(patent="US4405829A1", remove_kind_codes=['A1'])
patent_downloader.download(patent="EP0551921B1")
# Download multiple patents using a list of inputs to the current working directory
patent_downloader.download(
patent=["US4405829A1", "EP0551921B1", "EP1304824B1"],
output_path="./pdf_files",
remove_kind_codes=["A1"]
)
# Download multiple patents using a txt file to the current working directory
patent_downloader.download(
patent="docs/data/patents.txt",
output_path="",
remove_kind_codes=["A1"]
)
License
This repository is licensed under the MIT license.
See here for further information.