nfldb
nfldb copied to clipboard
Learning Query Interface
HI,
Am learning your query interface. Wondering if you (or someone) can help me achieve the following with a query:
"For each of the last 3 games Tom Brady played in show his passing yards (as individual totals for each game, not summed)"
I can get the value summed over 3 games but not as individual totals per game (without doing 3 queries at least).
Thanks.
Sure, shouldn't be difficult. Can you show what you've tried?
Yeap... The code below gets the total passing yards for each week (in this case week 14), but I cant figure out how to return 3 separate objects, one for each game with the passing yards (granted I'm only about 3 days into messing around with this stuff!). If I throw in a list for the week, it just sums them.
Thx
q = nfldb.Query(self.Db)
q.game(season_year=2015, week=14, season_type='Regular')
q.player(full_name="Tom Brady")
q.play_player(passing_yds__gt=0)
for x in q.sort('passing_yds').as_aggregate():
print x.player, x.passing_yds
I think your problem is with this line:
for x in q.sort('passing_yds').as_aggregate():
You are aggregating all of the data that is in the query when you do that.
This was actually tougher than I would have guessed. Here's one way to do it:
import nfldb
db = nfldb.connect()
q = nfldb.Query(db)
q.game(season_year=2015, season_type='Regular')
q.sort([('start_time', 'asc')])
q.player(full_name='Tom Brady')
for game in q.as_games()[-3:]:
passing_yards = 0
for pp in game.play_players:
if pp.player.full_name == 'Tom Brady':
passing_yards += pp.passing_yds
print game, passing_yards
Regular 2015 week 12 on 11/29 at 07:30PM, NE (24) at DEN (30) 280
Regular 2015 week 13 on 12/06 at 03:25PM, PHI (35) at NE (28) 312
Regular 2015 week 14 on 12/13 at 07:30PM, NE (27) at HOU (6) 226
@burntsushi might be along shortly to give the elegant way to do this.
Thanks for the help... Much appreciated.
Yeah, for me this is the kind of operation I'll be doing regularly. I want to do some short term projections based on trend lines so I need to get this kind of information out frequently so an optimal solution would be ideal.
I'll do some measurements to see how it performs.
Thanks again.
The query API cannot express "give me three distinct aggregate totals over three distinct sets of criteria." You need to run multiple queries. Actually, this really isn't a limit of the query API either, it's a limitation on what you can express to SQL. Aggregates impact the select query, so if you want to aggregate over different sets of criteria, you need to express that using multiple select queries.
"For each of the last 3 games Tom Brady played in show his passing yards (as individual totals for each game, not summed)"
The correct answer is to run three queries that select Tom Brady in each of the three games you're interested in.
The way to make this "fast" is to do aggregates for multiple players at a time. For example:
import nfldb
db = nfldb.connect()
q = nfldb.Query(db).game(season_year=2015, season_type='Regular', week=1)
for p in q.as_aggregate():
print p
That should output the aggregate totals of every player in week 1 of this season. If you want to limit it to a certain subset of players, then you can do q.player(player_id=ids)
, where ids
is a list of player ids that you're interested in.
In most cases, doing multiple queries will be faster than summing up statistics in Python. You should think of the query API as a wrapper around SELECT ...
. Think about how you would solve your problem using only SQL. If it requires more than one query, then chances are each SELECT will correspond to one nfldb query.
Thanks. This is very handy info!
Love the work!!
I was working on something similar just last week. It's basically the same approach as ochawkeye, using multiple queries, but looping on the week number, rather than by games.
import nfldb
db = nfldb.connect()
for i in range (1,15):
q = nfldb.Query(db)
q.game(season_year=2015, season_type='Regular', week=i)
q.player(full_name='Tom Brady')
for p in q.sort('passing_yds').as_aggregate():
print p.player, p.passing_yds
Not the most efficient approach, but it's straightforward to implement.
Edit: Taking another look at ochawkeye's code, he's only querying once and then returning a game worth of yards at a time, which is probably far more efficient than my approach.
Hi. Just to give an alternative solution, this is relatively easy to accomplish using just SQL.
SELECT
week,
sum(passing_yds) yards
FROM play_player
JOIN game USING (gsis_id)
JOIN player USING (player_id)
WHERE season_year = 2015
AND season_type = 'Regular'
AND full_name = 'Tom Brady'
GROUP BY game, week
ORDER BY week DESC
LIMIT 3
Result:
week | yards
------+-------
15 | 267
14 | 226
13 | 312
(3 rows)