soccerdata icon indicating copy to clipboard operation
soccerdata copied to clipboard

Big 5 league player stats for fbref

Open andrewRowlinson opened this issue 1 year ago • 4 comments

Added read_big5_season_stats to rbref.py for efficiently reading the data from the big five leagues (England, Italy, France, Germany, Spain).

andrewRowlinson avatar Jul 27 '22 17:07 andrewRowlinson

Fbref also has pages for the big five leagues that allow you to more efficiently get player data when you want multiple leagues. I added a method here to get this data, but it doesn't fit neatly into the existing class as it ignores the leagues attribute. I have tried to keep the interface and results similar to the other methods.

andrewRowlinson avatar Jul 27 '22 17:07 andrewRowlinson

Thanks for your PR! The fbref.read_team_season_stats method is indeed inefficient as it visits the page of each individual team in a league. I've noticed that FBRef now has a single page for each league/season where these stats can be obtained (e.g., https://fbref.com/en/comps/9/stats/Premier-League-Stats). Using that page to obtain the data would already reduce the number of requests by a factor 15-20x and it works for each league.

Additionally, you could then use the page for the top-5 leagues if the user requested data from (multiple) of the top-5 leagues, but the benefit would be more limited. This should not be a separate (public) function though. It should be integrated in the fbref.read_team_season_stats function and the selection of the best source page to obtain the data from should happen transparently for the user.

probberechts avatar Jul 28 '22 08:07 probberechts

I've noticed that FBRef now has a single page for each league/season where these stats can be obtained (e.g., https://fbref.com/en/comps/9/stats/Premier-League-Stats). Using that page to obtain the data would already reduce the number of requests by a factor 15-20x and it works for each league.

Unfortunately, the player stats for these new pages (e.g. https://fbref.com/en/comps/9/stats/Premier-League-Stats) wouldn't be loaded by the existing functions, as it only currently loads the top table containing squad/ opponent stats and not the player statistics underneath. I think you can get around this using Selenium to load the whole page, unless I am missing a simpler way?

Additionally, you could then use the page for the top-5 leagues if the user requested data from (multiple) of the top-5 leagues, but the benefit would be more limited. This should not be a separate (public) function though. It should be integrated in the fbref.read_team_season_stats function and the selection of the best source page to obtain the data from should happen transparently for the user.

I have amended the fbref.read_team_season_stats to use the Big-5 league data. It is significantly faster, however, the disadvantage is that you lose the aggregated team and opponent statistics and it also misses the players who have not played any minutes.

andrewRowlinson avatar Jul 31 '22 10:07 andrewRowlinson

Unfortunately, the player stats for these new pages (e.g. https://fbref.com/en/comps/9/stats/Premier-League-Stats) wouldn't be loaded by the existing functions, as it only currently loads the top table containing squad/ opponent stats and not the player statistics underneath. I think you can get around this using Selenium to load the whole page, unless I am missing a simpler way?

I've only quickly looked at this in the browser, but it seems that the tables are actually there with all the data. They are just commented out in the HTML. Some javascript then makes them visible. I think a simple html.replace('<!--', '') should do the trick.

probberechts avatar Jul 31 '22 12:07 probberechts