odds-portal-scraper icon indicating copy to clipboard operation
odds-portal-scraper copied to clipboard

Enhancements to full_scraper allowing selection of seasons, sportsbooks, bet types, bet options

Open jemorriso opened this issue 4 years ago • 3 comments

First of all thanks for creating this library! I was planning on creating a scraper to get WHL and OHL odds but luckily I found this which saved me a ton of time. I tried to leave as much of the code in an unchanged state, but inevitably I had to make some changes. This is a work in progress and some stuff is rough around the edges.

Might be best to run it with DEBUG set to true in both scraper and op so you can get a sense of what the code is doing.

I added a bunch of config files that I used for testing along with their output files as examples. These do not need to be included.

I've never made a pull request before so sorry if I didn't follow guidelines properly. Let me know what you think!

Proposed Changes

Adding a bunch of functionality to full_scraper. Most of the changes are in scraper.py. Depending on the configuration provided by the user (adding to previous config found in sports.json) the scraper will behave exactly how it did previously, or it will go to the game link and scrape odds for sportsbook / bet market / bet options.

This is a breaking change.

Description

Config JSON file

New config options (they are optional):

  • seasons:
    • list of seasons that get scraped
    • added this because otherwise it takes a very long time with the new enhancements
    • I didn't change the crawler, which still crawls every available season.
  • outcome_headers:
    • what should the keys in the JSON output file be named
    • added this because on OddsPortal it's just a number.
  • bet_type:
    • part of the URL referring to the bet type.
  • sub_bet_type:
    • part of the URL referring to the bet type sub-option
    • this is always an integer.
  • odds_sources:
    • list of names of the odds sources you want to scrape
    • odds source can be either the name of a sportsbook or 'average' or 'highest'
    • if list empty, scrape all.
  • bet_options:
    • list of options you want to scrape for the bet type, e.g. the range of totals or spreads to scrape
    • if list empty, scrape all.

example URL: https://www.oddsportal.com/hockey/canada/ohl/flint-firebirds-windsor-spitfires-40a8ssXS/#home-away;1

  • here 'home-away' is the bet type, and '1' is the sub bet type.

op.py

New global variables:

  • DEBUG:
    • added this so that if debug is true, the code does not run in parallel, and it only scrapes the first page.
  • DELETE_FILES:
    • added so that you can have multiple different output files in one folder, without deleting all the other files in the folder.

Attributes added to working_seasons elements:

  • add all the new config options as attributes on the working seasons, in the same way as the old config options.

models.py

  • add odds as an attribute on Game
    • I needed some place to put odds for all the different bet options

scraper.py

New global variables:

  • DEBUG:
    • added this so that if debug is true, the code does not run in parallel, and it only scrapes the first page.

Functions:

  • get_populate_odds_method:
    • determines if the program will behave as it did previously, or go to the game link to scrape odds.
  • default_populate_odds:
    • the last part of populate_games_into_season previously.
  • populate_odds_detailed:
    • go to the game link, and scrape odds there
    • if there are multiple rows, open them if necessary using Selenium
    • iterate over rows, or if there is only 1 table element, scrape it
    • adds odds to the game variable like the scraper does in populate_games_into_season.
  • open_rows:
    • open rows of interest using Selenium.
  • scrape_odds_detailed:
    • create and return dictionary of odds
    • which odds are scraped depends on which odds sources the user specified in the config file.

To Do

  • update README
  • update crawler for seasons config option
  • clean up output JSON file to account for new odds dict
  • DEBUG should be present in one file only.

Issues

  • it's very slow because the scraper goes into each game link and back
  • default_populate_odds and the new odds attribute on the Game model don't really work together
    • the output file is messy as a result of this.

jemorriso avatar Jan 19 '21 04:01 jemorriso

Don't work on Windows 7, Python 3.8(x86) c:\Helper\full_scraper>python op.py Traceback (most recent call last): File "op.py", line 8, in from joblib import delayed File "C:\Users\obtim\AppData\Local\Programs\Python\Python38-32\lib\site-packag es\joblib_init_.py", line 119, in from .parallel import Parallel File "C:\Users\obtim\AppData\Local\Programs\Python\Python38-32\lib\site-packag es\joblib\parallel.py", line 28, in from .parallel_backends import (FallbackToBackend, MultiprocessingBackend, File "C:\Users\obtim\AppData\Local\Programs\Python\Python38-32\lib\site-packag es\joblib_parallel_backends.py", line 22, in from .executor import get_memmapping_executor File "C:\Users\obtim\AppData\Local\Programs\Python\Python38-32\lib\site-packag es\joblib\executor.py", line 14, in from .externals.loky.reusable_executor import get_reusable_executor File "C:\Users\obtim\AppData\Local\Programs\Python\Python38-32\lib\site-packag es\joblib\externals\loky_init.py", line 12, in from .backend.reduction import set_loky_pickler File "C:\Users\obtim\AppData\Local\Programs\Python\Python38-32\lib\site-packag es\joblib\externals\loky\backend\reduction.py", line 125, in from joblib.externals import cloudpickle # noqa: F401 File "C:\Users\obtim\AppData\Local\Programs\Python\Python38-32\lib\site-packag es\joblib\externals\cloudpickle_init_.py", line 3, in from .cloudpickle import * File "C:\Users\obtim\AppData\Local\Programs\Python\Python38-32\lib\site-packag es\joblib\externals\cloudpickle\cloudpickle.py", line 152, in _cell_set_template_code = _make_cell_set_template_code() File "C:\Users\obtim\AppData\Local\Programs\Python\Python38-32\lib\site-packag es\joblib\externals\cloudpickle\cloudpickle.py", line 133, in _make_cell_set_tem plate_code return types.CodeType( TypeError: an integer is required (got type bytes)

obtim avatar Jan 19 '21 18:01 obtim

Huh. I can't explain that, but the changes in this pull request should not be what's causing that error. Have you tried running the unforked code?

jemorriso avatar Jan 19 '21 22:01 jemorriso

Huh. I can't explain that, but the changes in this pull request should not be what's causing that error. Have you tried running the unforked code?

In my case, I had to manually update all packages to the latest: pip list -o

obtim avatar Jan 20 '21 07:01 obtim