q
q copied to clipboard
Support for big file
In case the source file is big, how about providing an option to enable temporary file based sqlite. I'm not sure if it is possible.
Here are the related links.
- https://www.sqlite.org/inmemorydb.html (At the last section there.)
- https://stackoverflow.com/a/27298043/1874690
Nice, the temporary database option seems pretty interesting.
I'll try to arrange that the next version provides a flag for it. If it proves to be stable enough and fast enough, then I'll make it the default in the next one.
Thanks!
I tried with a 300 MB file, even a simple query with where condition takes 7 to 8 minutes to execute. surprised how Log parser manages to return the same result in 3 seconds.
This is interesting, since i have a lot of windows users that report much faster speeds (am i assuming correctly that you're using windows because of the reference to LogParser?). Can you provide more details? Machine Type, hard disk type, number of rows and columns in the file, perhaps a "demo" of the first 200 lines or so would help (if possible of course in terms of privacy. You could also send it to my email so it's not here in public).
I have done some tests in the past regarding temporary sqlite files, but it's much slower than in-memory. Btw, the newest version of q has the ability to dump the parsed output into an sqlite db file (-S or --save-db-to-disk), so you can process the data inside sqlite itself if needed.
btw, i've created a file with 1M rows and 48 columns on each row (315MB), and querying it in q takes 1 minute and 5 seconds on my laptop (macbook pro). I'm attaching the file here, it would be interesting if you download it and measure several runs with this file, so we can have some rough comparison. The file is zipped for upload/download convenience, unzip it before checking.
btw, i'm getting lots of positive responses from users regarding q's relative speed, but obviously the tool is more optimized for convenience rather than speed currently. I've started fiddling around with replacing the processing mechanism in q to work with Spark, but it's not productizable yet.
New version is out (3.1.6), which contains an automatic caching mechanism. This mechanism improves querying speed by two orders of magnitude for large files (~2 seconds query instead of ~5 minutes for a file of 4.8GB).