csvkit icon indicating copy to clipboard operation
csvkit copied to clipboard

csvsql: "no such table" error only thrown when force-quitting

Open slhck opened this issue 6 years ago • 5 comments

I am querying a data file:

csvsql --query 'select max(video_session_started_at) from data' tmp_correct_ids.csv

I did this, naively, by adapting the existing code, assuming that data would be a generic name for the table that'd be generated in memory after reading from the CSV.

I was wrong. It seems the table name is inferred from the CSV filename without suffix. I later learned that from carefully reading the manpage.

However, earlier, I only found out after I killed the command:

^C(sqlite3.OperationalError) no such table: data
[SQL: select max(video_session_started_at) from data]
(Background on this error at: http://sqlalche.me/e/e3q8)

Would it be possible to throw this error and stop csvsql once it occurs? The command just seemed to hang for me until I quit it.

Using csvsql 1.0.4 under Ubuntu 18.04 with Python 3.6.

slhck avatar May 29 '19 10:05 slhck

Agreed – it shouldn't hang and should instead error.

jpmckinney avatar Jun 03 '19 19:06 jpmckinney

this one seems fun! i might pick it up next :)

it seems that this could be a symptom of size. i know this is a long time ago, but do you remember how large this file was? when i test on small files, it errs out immediately. but when i test on large ones ( >100k rows), that's when it starts slowing down.

it might load the file into memory before checking. perhaps there's a way to fail faster here.

dannysepler avatar Apr 02 '20 02:04 dannysepler

Thanks for looking into it! Yes, the file that I was working with definitely had more than 100K rows.

Since the table name is inferred from the filename, I guess this can be checked even before loading any record.

slhck avatar Apr 02 '20 07:04 slhck

Thanks in advance, @dannysepler!

jpmckinney avatar Apr 02 '20 16:04 jpmckinney

just an update here, i'm going to bow out of this one. Sorry! It doesn't seem like a bug so much as a speedup feature. While that would still be fun, the only ways I can rationalize doing it are pretty messy. (Like you said, the table name inference is easy. But how do we know what tables are being are being queried for? Seems like we'll need a SQL parser of some sort.)

To me an interesting question as well is around what happens when you Ctrl-C a big table read, and it gets caught in the "StopIteration" here. Somehow, it is still able to successfully continue the command!

Just leaving some food for thought for whoever picks this up next.

dannysepler avatar Apr 05 '20 03:04 dannysepler

I can't reproduce the issue. The message appears immediately.

$ printf 'a,b\n' | csvsql --query 'select max(video_session_started_at) from data'
OperationalError: (sqlite3.OperationalError) no such table: data
[SQL: select max(video_session_started_at) from data]
(Background on this error at: https://sqlalche.me/e/14/e3q8)

jpmckinney avatar Oct 17 '23 23:10 jpmckinney

I can also no longer reproduce this, thanks for checking.

slhck avatar Oct 18 '23 06:10 slhck