hepsiburada-review-scraper icon indicating copy to clipboard operation
hepsiburada-review-scraper copied to clipboard

Hepsiburada review/comment and rating scraper. Turkish text dataset creator for data science and NLP projects. 📜

Made with Python

Hepsiburada Review Scraper HB Review Scraper Build Status GPLv3

Hepsiburada review/comment and rating scraper. Turkish text dataset creator for data science and NLP projects. Nearly 30M reviews with category and product links can be crawled and used for text classification, sentiment analysis, text mining, NLP models etc. Supported by multithreading, written in Python.

Prerequisites

$ pip3 install -r requirements.txt

Installation

$ git clone https://github.com/0x01h/hepsiburada-review-scraper.git
$ cd hepsiburada-review-scraper
$ python3 hepsiburada.py

Usage

Program provides an human-friendly interactive shell for users.

Features

  • Shutdown computer after finishing: Optional choice for deep and long scrapings.
  • Threads: Try to give a proper number. (Recommended value is 64.)
  • Timeout: Giving a large number could result in long waiting times, small numbers could lead connection failures. (Recommended time range is 15-30 seconds.)
  • Pagination Depth: Maximum number of paginated review pages for each product.

You can track your progress via progress bars. categories.txt, products.txt, hepsiburada.txt will be saving to your current directory.

Do not scrape aggressively! Otherwise, you will be caught by captcha challenge!

Say thanks.

For educational purposes only.