dittodb icon indicating copy to clipboard operation
dittodb copied to clipboard

Support capturing parameterized queries

Open seasick opened this issue 3 years ago • 4 comments

Brief description of the problem

Currently parameters of e.g. dbSendQuery are not considered when storing the statement and its return value.

When capturing two identical queries with different parameters only the result of the last query persists because the second capturing will overwrite the captured result of the first.

# Simplified code ahead ---v
capture_db_requests({
  dbSendQuery(con, "SELECT * FROM airlines WHERE carrier = $1 LIMIT 1", list("9E"))
  # capturing this overwrites the result of the previous capture, because only the statement is hashed
  dbSendQuery(con, "SELECT * FROM airlines WHERE carrier = $1 LIMIT 1", list("AA")) 
})

with_mock_db({
  # Both queries will get "AA" airlines
  dbSendQuery(con, "SELECT * FROM airlines WHERE carrier = $1 LIMIT 1", list("9E"))
  dbSendQuery(con, "SELECT * FROM airlines WHERE carrier = $1 LIMIT 1", list("AA"))
})
I've build a small test case - click me
# tests/testthat/test-parameterized-query.R
with_mock_path(path = file.path(temp_dir, "parameteterized-query"), {
    capture_db_requests({
        suppressMessages(con <- nycflights13_create_sqlite(verbose = FALSE))

        result <- dbSendQuery(con, "SELECT * FROM airlines WHERE carrier = $1 LIMIT 1", list("9E"))
        dbClearResult(result)

        result <- dbSendQuery(con, "SELECT * FROM airlines WHERE carrier = $1 LIMIT 1", list("AA"))
        dbClearResult(result)

        dbDisconnect(con)
    })

    with_mock_db({
        test_that("fetching queries with parameters", {
            suppressMessages(con <- nycflights13_create_sqlite(verbose = FALSE))

            # make a query
            result <- dbSendQuery(con, "SELECT * FROM airlines WHERE carrier = $1 LIMIT 1", list("9E"))
            rows <- DBI::dbFetch(result)
            dbClearResult(result)

            expect_equal(rows$carrier, "9E")

            # make a different one
            result <- dbSendQuery(con, "SELECT * FROM airlines WHERE carrier = $1 LIMIT 1", list("AA"))
            rows <- DBI::dbFetch(result)
            dbClearResult(result)

            expect_equal(rows$carrier, "AA")

            dbDisconnect(con)
        })
    })
})

Is there currently a way to have identical queries with different parameters and different captures?

seasick avatar Jan 25 '22 12:01 seasick

Thanks for this issue, fantastic repro and discussion! Yup, I have not (yet) added other arguments to the hashing algorithm that tells queries apart. It shouldn't be too complicated to add it to the hashing system. Are you interested in sending a pull request doing that (You've already got the hard part done in the test case there!)? I'm more than happy to review something or send some pointers about where to look/what to alter...

jonkeane avatar Jan 27 '22 23:01 jonkeane

@jonkeane, I can give it a shot. I also would be more than happy about some pointers before going into it :smile:

seasick avatar Jan 31 '22 08:01 seasick

Oops, sorry this notification got lost in my inbox 🙈

As a first pass I would try expanding the hash function to include at least one more argument where you can put the parameters. Since we don't really care about the hash doing anything other than being unique, you could even do something like dput(params) to get a string representation of those values at this point.

https://github.com/ropensci/dittodb/blob/75e3ce05d48cec2a23fb0bc535db30fd25bf5845/R/paths.R#L29-L33

and then for recording alter somewhere around:

https://github.com/ropensci/dittodb/blob/75e3ce05d48cec2a23fb0bc535db30fd25bf5845/R/capture-requests.R#L206

The recording bit is a little funny, the expression in dbSendQueryTrace there is executed at the end of a normal dbSendQuery invocation (the trace_dbi() function basically inserts that call at the end). So you'll have access to all of the values that have been written there (like params). You might need to do something to ensure that params exists before adding it to the hash, since not all backends support it (it's optional in the DBI spec).

and for reading alter somewhere around (which should be more straight forward):

https://github.com/ropensci/dittodb/blob/75e3ce05d48cec2a23fb0bc535db30fd25bf5845/R/dbQueries-Results.R#L54

In case you don't already know about this: you can set set_dittodb_debug_level() to an integer >2 to see lots of extra messages which I've found helpful when debugging this kind of thing!

jonkeane avatar Feb 12 '22 16:02 jonkeane

Oops, sorry this notification got lost in my inbox :see_no_evil:

Same here :see_no_evil: :grin:

I will take a stab at it in the coming days or weeks, not sure when I got the time for it. But it doesn't sound to complex from your description - I feared it would be harder.

Thanks for taking the time putting that together!

seasick avatar Feb 23 '22 09:02 seasick