q Reuse of previously loaded data

Reusing a file that has already been loaded in the past should be faster. Can be that by some form of caching the loaded data.

Jan 04 '14 12:01 harelba

On a similar note I was wondering how one could reuse the generated db. Changing :memory: to q.sqlite and ending with db.conn.commit() instead of table_creator.drop_table() did the trick.

Caching the data is not obvious, as you need to check if it's the same (could be the file's md5sum), and have some sort of garbage collection.

Aug 14 '14 22:08 Fil

Yeah, as we all know, the 2 most difficult problems in computer science are cache invalidation, naming things and off by one errors.

Aug 14 '14 22:08 bitti

Exactly :)

Hi, sorry for the late reply. Been offline for a couple of days.

Thanks a lot, I'll take a deeper look at your tip and see if I can find some trick to make the invalidation fast enough (was planning on cksum, perhaps a sampled cksum or something, with an option to be stricter and slower through a command line parameter).

Harel

Aug 16 '14 20:08 harelba

I've created an API which will allow q to be used from python code as a module. The changes also inherently include the possibility to reuse previously loaded data (e.g. running multiple queries against the same loaded data).

Alpha version of the new API will be committed into the main branch in a couple of days.

Nov 16 '14 01:11 harelba

Alpha branch of the python api has been committed to https://github.com/harelba/q/tree/expose-as-python-api.

The python api supports reuse of already-loaded data, and this capability is exposed to the command line by allowing the user to write multiple queries in the same q execution - E.g. q "select ..." "select ..." "select ..." .... Running q like that will load the data only once for each file, even if it's used in multiple queries. In the future, I'll probably add an interactive REPL for this as well.

Any input would be helpful and appreciated.

Harel

Nov 23 '14 23:11 harelba

Forgot to write - The readme file of the branch contains the required information about the API.

Nov 23 '14 23:11 harelba

This capability is now fully supported internally, and exposed partially by running multiple queries on the same command line (Every invocation of q reuses the data between multiple queries that are being run).

This issue will be closed when the feature is fully exposed.

Dec 13 '14 19:12 harelba

This can be also done like an interactive SQL client, so at the start it loads all the data(into memory, I don't care, I have 32GB ram) and then we can execute the queries. My sample file is like 3GB and waiting for another minute per each query isn't good. Like:

$ q --client -H data.csv as data
q > select count(*) from data
------------------
|  count(*)     |
------------------
|  10000000     |
------------------
q > select my_field from data where condition=true limit 3
-------------
|  my_field |
-------------
|  val1     |
-------------
|  val2     |
-------------
|  val3     |
-------------

Support for multiple files discussable.

Jun 21 '21 21:06 msangel

Hi @msangel @Fil

I'm going to release a new version of q soon. It's a large change, which includes inherent caching capabilities similar to the ones you're describing, eliminating the need to wait between multiple queries of the same file.

Harel

Jun 22 '21 19:06 harelba

q q copied to clipboard

Reuse of previously loaded data

q
q copied to clipboard