pybaseball icon indicating copy to clipboard operation
pybaseball copied to clipboard

Scrape Player Projection Data from Fangraphs

Open TK2575 opened this issue 2 years ago • 5 comments

Introduces a function and related tests and documentation that captures player projection data from Fangraphs. Provides argument options to specify the projection source, position, league and team. Extends the teamid lookup method to provide a fg team ID lookup needed for applying team level filtering using a stored dictionary fixture.

TK2575 avatar Mar 10 '23 18:03 TK2575

Sorry commented in line as opposed to on PR:

I think you're doing too much work here there's an api: pitch_df = pd.DataFrame(json.loads(requests.get('https://www.fangraphs.com/api/projections?stats=pit&type=steamer').content

blacktj avatar Mar 10 '23 18:03 blacktj

Thanks @blacktj, didn't know that API endpoint existed in front of a paywall, that's great! I'm assuming there's some rate limit expectation we'll need to respect like we do with baseball reference? I'll need to dig into this a bit.

TK2575 avatar Mar 10 '23 19:03 TK2575

It's non-public and buried in the client-side rendering of the table. I am working on a PR for the prospects endpoint of this as well.. not sure if it's rate limited though. It's wide open. The risk I see is if they do lock it down.

blacktj avatar Mar 10 '23 19:03 blacktj

I think I'll need to defer to this repo's maintainers as to which approach to take. There's precedence for scraping Fangraphs page source for other methods, though I don't know if that's because either a) we weren't aware of the API at the time or b) the API didn't/doesn't support those data. Querying from the API would certainly be cleaner, but I'd be hesitant in moving forward using a non-public API without some form of developer contract and/or buy-in from this repo's maintainers.

TK2575 avatar Mar 10 '23 21:03 TK2575

This is a webscraping repo.. so I'm guessing we don't have a contract to pull the data from their actual website? Is there a difference between grabbing it there or from the API they use to render the table?

blacktj avatar Mar 10 '23 21:03 blacktj