transfermarkt-scraper
transfermarkt-scraper copied to clipboard
Expanding the number of players/clubs/competitions included in the hosted instance
I've been joining the data on the FIFA 23 player database, and only about half of the players in the FIFA database are present in the scraped Transfermarkt data. The rest are on Transfermarkt data, but not scraped. These fall into a few different categories:
- Non-European leagues - teams from Brazil, United States, Argentina, Australia, Chile etc.
- Lower level European leagues - there are no teams from League One or lower in England, Ligue 2 or lower in France, LaLiga 2 or lower in Spain etc.
This works out to be about 10,000 players, so it would be great to find a way of incorporating them (if it doesn't make it too unwieldy!)
@tvqt - Thanks for the suggestion.
I haven't really tried to scrape non EU and lower level leagues, but I'd assume that's possible without changing the scraper by providing appropriate parameters / parent files.
Is you question whether all this leagues could be added to the datasets? If so, perhaps we can discuss in a new issue in https://github.com/dcaribou/transfermarkt-datasets
@dcaribou, hi! And thanks for the nice product. Why do you scrape only up to 25 competitions per type (first_tier, domestic_cup...) from confederations?
Hey @visheugene.
There's not an explicit limit on the number of competitions scraped by the competitions
crawler. However, this crawler does scrape the first page from competitions list in the confederation page only, which contains exactly 25 competitions.
The reason why it scrapes the first page only is that it was simple enough and it already covered most relevant competitions (top 25 countries by market cap), so I stopped there.
data:image/s3,"s3://crabby-images/2adc7/2adc7d3627c0850305a711731efd1577dc799a93" alt="Screenshot 2023-01-09 at 19 34 11"
It should not be too hard to modify the competitions scraper so it recurses through the rest of the pages in the competitions list though, it that's needed.
Hey @dcaribou,
Would you be able to help me with modifying the scraper so it recurses through the rest of the pages in the competitions list? I'm having difficulties setting this up.
Hey @dcaribou,
Would you be able to help me with modifying the scraper so it recurses through the rest of the pages in the competitions list? I'm having difficulties setting this up.
Hey @ScottishWolverine. Sure. If you are having problems settings things up you may raise a new issue describing your problem.