py-retrosheet
py-retrosheet copied to clipboard
SQLite errors
Hello,
So, I am using the latest version of py-retrosheet with SQLite3. There are two problems I've run into, namely:
- Queries of the form
'SELECT * FROM rosters WHERE year = %s AND player_id = %s AND team_tx = %s'
are malformed, and - SQLite's use of
rowcount
is botched
Malformed Queries. The initial run, it connects to the database, but when trying populating my database with 2014 data...it throws errors. I saw on line 58-59 of parse.py
the queries look like
sql = 'SELECT * FROM rosters WHERE year = %s AND player_id = %s AND team_tx = %s'
res = conn.execute(sql, [row[0], row[1], row[6]])
But shouldn't the queries be:
sql = 'SELECT * FROM rosters WHERE year = ? AND player_id = ? AND team_tx = ?'
res = conn.execute(sql, [row[0], row[1], row[6]])
Note the change of %s
to ?
. This appears to be how SQLalchemy expects parameters (at least, when I made these changes locally, it worked for SQLite).
SQLite Cannot Count. This bug is really SQLite's fault.
The second run, I try download.py -y 2013
, then parse.py -y 2013
. It blows up. There's a problem with SQLalchemy's SQLite implementation for rowcount
(c.f., this blog post). Consequently, parse.py
thinks the Anaheim Angels are not in the teams
table, so it attempts to insert them in...causing an error.
The solution is, according to SQLalchemy's documentation, we shouldn't be using results.rowcount()
at all (the doc claims it makes sense only after an UPDATE
or DELETE
to call results.rowcount()
). We should instead be using len(results.fetchall())
, which may be dangerous in general (since we don't know how many rows may be fetched)...but since we're expecting at most 1 row, the change appears harmless.