transfermarkt-scraper icon indicating copy to clipboard operation
transfermarkt-scraper copied to clipboard

Scraping of `market_value_history` is broken

Open dcaribou opened this issue 9 months ago • 5 comments

Parent issue → https://github.com/dcaribou/transfermarkt-datasets/issues/215

The attribute market_value_history is coming as null in the latest runs, likely due to an upstream change on Transfermarkt side.

$ scrapy crawl players -a parents=samples/clubs.json -s USER_AGENT="..." | jq '.market_value_history'
null
null
null
...

dcaribou avatar Sep 28 '23 15:09 dcaribou

The html used for extracting the market_value_history has changed significantly on a recent update in Transfermarkt, and the existing logic for extracting the data does no longer work https://github.com/dcaribou/transfermarkt-scraper/blob/3e4ccb8488df1d843d36c8a8cd5d8bea949ae2d8/tfmkt/spiders/players.py#L120

The new html uses an svg graph which appears quite hard to reverse-engineer at this point.

Screenshot 2023-09-28 at 17 19 18

dcaribou avatar Sep 28 '23 15:09 dcaribou

I found that there will be a request on this page, so maybe that helps a bit.

Request URL:
https://www.transfermarkt.com/ceapi/marketValueDevelopment/graph/28003

微信截图_20230930175604

LarchLiu avatar Sep 30 '23 10:09 LarchLiu

Ah, and it's an open API 🙌 This is super helpful, even the response format is the same as the scraped market_value_history object.

dcaribou avatar Sep 30 '23 10:09 dcaribou

Should we create a new player_valuations crawler with this API? Seems like it just in development?

LarchLiu avatar Oct 09 '23 04:10 LarchLiu

the old graph is still available on this page : https://www.transfermarkt.fr/dimitri-payet/marktwertverlauf/spieler/37647

n-richaud avatar Oct 09 '23 12:10 n-richaud