transfermarkt-scraper
transfermarkt-scraper copied to clipboard
Scraping of `market_value_history` is broken
Parent issue → https://github.com/dcaribou/transfermarkt-datasets/issues/215
The attribute market_value_history
is coming as null in the latest runs, likely due to an upstream change on Transfermarkt side.
$ scrapy crawl players -a parents=samples/clubs.json -s USER_AGENT="..." | jq '.market_value_history'
null
null
null
...
The html used for extracting the market_value_history has changed significantly on a recent update in Transfermarkt, and the existing logic for extracting the data does no longer work https://github.com/dcaribou/transfermarkt-scraper/blob/3e4ccb8488df1d843d36c8a8cd5d8bea949ae2d8/tfmkt/spiders/players.py#L120
The new html uses an svg graph which appears quite hard to reverse-engineer at this point.
I found that there will be a request on this page, so maybe that helps a bit.
Request URL:
https://www.transfermarkt.com/ceapi/marketValueDevelopment/graph/28003
Ah, and it's an open API 🙌
This is super helpful, even the response format is the same as the scraped market_value_history
object.
Should we create a new player_valuations
crawler with this API?
Seems like it just in development?
the old graph is still available on this page : https://www.transfermarkt.fr/dimitri-payet/marktwertverlauf/spieler/37647