NEEA-TOEFL-Testseat-Crawler
NEEA-TOEFL-Testseat-Crawler copied to clipboard
托福考位爬虫 NEEA TOEFL Testseat Crawler
NEEA托福考位爬虫 Getting Started with NEEA TOEFL Testseat Crawler
本文档简要介绍了NEEA托福考位本地爬虫的使用方法。 This document provides a brief intro of the usage of NEEA TOEFL Test Seats Selenium Crawler.
动机 Motivation
NEEA 托福考位网站正在提供着不便的服务。在寻找考位时,我们需要按每个日期,每个城市一个个地搜索考位, 这为那些想尽快找到测试座位的人带来了无法忍受的体验。
data:image/s3,"s3://crabby-images/19674/19674d71fbcd80339243a73a7c14ad25d6ce2dcc" alt=""
为什么不直接以表格形式显示所有考位?
NEEA TOEFL Test Seat website, supported by Chinese National Education Examinations Authority (NEEA), is providing an inconvenience service. When looking for a test seat, we need to search date by every date, every city, which brings an intolerable experience for those who just want to find a test seat ASAP. Why not display the form of all the test seat?
安装要求 Requirements
- Firefox mozilla geckodriver v0.26.0
- Firefox ≥ 60
- pip install selenium
安装方式 Install
-
Firefox mozilla geckodriver: the default geckodriver path is "C:\Program Files\Mozilla Firefox\geckodriver.exe". If you want to set your executable path, please use --webdriver_path='your path' to start.
-
默认Firefox mozilla geckodriver是安装在"C:\Program Files\Mozilla Firefox\geckodriver.exe"路径中,如果你希望使用其他路径, 请使用 --webdriver_path='your path' 来启动爬虫。
Get start
default start
python crawler_toefl.py --username='NEEA ID number' --password='password'
When finished, you can get a .csv form file. 爬虫完成后将得到.csv表格文件。
data:image/s3,"s3://crabby-images/27c66/27c668cd58fef01c992402dcece25015e28c4162" alt=""
Todo:
- faster, test time is 30 min 爬虫速度太慢了, 爬完全部数据目前需要30分钟
- headless mode 无界面模式怎么绕开反爬虫?
- Anti anti-crawler when click the 'search seats' button 怎么绕开反爬虫?
- online crawler (use a server) 在线爬虫(服务器)
- different modes 用户定制化爬虫
Acknowledgement
This idea is initially coming from https://www.jianshu.com/p/2541d918869e, thanks!