odds-portal-scraper
odds-portal-scraper copied to clipboard
Enhancements to full_scraper allowing selection of seasons, sportsbooks, bet types, bet options
First of all thanks for creating this library! I was planning on creating a scraper to get WHL and OHL odds but luckily I found this which saved me a ton of time. I tried to leave as much of the code in an unchanged state, but inevitably I had to make some changes. This is a work in progress and some stuff is rough around the edges.
Might be best to run it with DEBUG
set to true in both scraper and op so you can get a sense of what the code is doing.
I added a bunch of config files that I used for testing along with their output files as examples. These do not need to be included.
I've never made a pull request before so sorry if I didn't follow guidelines properly. Let me know what you think!
Proposed Changes
Adding a bunch of functionality to full_scraper. Most of the changes are in scraper.py
. Depending on the configuration provided by the user (adding to previous config found in sports.json
) the scraper will behave exactly how it did previously, or it will go to the game link and scrape odds for sportsbook / bet market / bet options.
This is a breaking change.
Description
Config JSON file
New config options (they are optional):
-
seasons
:- list of seasons that get scraped
- added this because otherwise it takes a very long time with the new enhancements
- I didn't change the crawler, which still crawls every available season.
-
outcome_headers
:- what should the keys in the JSON output file be named
- added this because on OddsPortal it's just a number.
-
bet_type
:- part of the URL referring to the bet type.
-
sub_bet_type
:- part of the URL referring to the bet type sub-option
- this is always an integer.
-
odds_sources
:- list of names of the odds sources you want to scrape
- odds source can be either the name of a sportsbook or 'average' or 'highest'
- if list empty, scrape all.
-
bet_options
:- list of options you want to scrape for the bet type, e.g. the range of totals or spreads to scrape
- if list empty, scrape all.
example URL: https://www.oddsportal.com/hockey/canada/ohl/flint-firebirds-windsor-spitfires-40a8ssXS/#home-away;1
- here 'home-away' is the bet type, and '1' is the sub bet type.
op.py
New global variables:
-
DEBUG
:- added this so that if debug is true, the code does not run in parallel, and it only scrapes the first page.
-
DELETE_FILES
:- added so that you can have multiple different output files in one folder, without deleting all the other files in the folder.
Attributes added to working_seasons
elements:
- add all the new config options as attributes on the working seasons, in the same way as the old config options.
models.py
- add
odds
as an attribute onGame
- I needed some place to put odds for all the different bet options
scraper.py
New global variables:
-
DEBUG
:- added this so that if debug is true, the code does not run in parallel, and it only scrapes the first page.
Functions:
-
get_populate_odds_method
:- determines if the program will behave as it did previously, or go to the game link to scrape odds.
-
default_populate_odds
:- the last part of
populate_games_into_season
previously.
- the last part of
-
populate_odds_detailed
:- go to the game link, and scrape odds there
- if there are multiple rows, open them if necessary using Selenium
- iterate over rows, or if there is only 1 table element, scrape it
- adds odds to the
game
variable like the scraper does inpopulate_games_into_season
.
-
open_rows
:- open rows of interest using Selenium.
-
scrape_odds_detailed
:- create and return dictionary of odds
- which odds are scraped depends on which odds sources the user specified in the config file.
To Do
- update README
- update crawler for
seasons
config option - clean up output JSON file to account for new odds dict
-
DEBUG
should be present in one file only.
Issues
- it's very slow because the scraper goes into each game link and back
-
default_populate_odds
and the newodds
attribute on theGame
model don't really work together- the output file is messy as a result of this.
Don't work on Windows 7, Python 3.8(x86)
c:\Helper\full_scraper>python op.py
Traceback (most recent call last):
File "op.py", line 8, in
Huh. I can't explain that, but the changes in this pull request should not be what's causing that error. Have you tried running the unforked code?
Huh. I can't explain that, but the changes in this pull request should not be what's causing that error. Have you tried running the unforked code?
In my case, I had to manually update all packages to the latest: pip list -o