DatabaseConnector
DatabaseConnector copied to clipboard
Use dbplyr native backends instead of relying on SqlRender translation
Hi @schuemie,
@edward-burn, @catalamarti, and I would like get the Darwin tools working with DatabaseConnector as a DBI database driver. I think to do this we would like to change how DatabaseConnector integrates with dbplyr. Currently it works very differently than all other DBI drivers and we would like to do the work to make it work in the same way as other DBI drivers.
Basically the changes would be
- Use a DatabaseConnectorConnection class instead of the "Microsoft sql server" class for the connection.
- Add a DatabaseConnector-backend.R file to either dbplyr or DatabaseConnector so that dbply would generate the SQL for each database platform.
This would help a lot since we are currently having success using this approach with other DBI drivers but have issues with DatabaseConnector which many people would like to use with Darwin tools. This change allows us to use DatabaseConnector as a DBI driver backend without using SqlRender for SQL translation. dbplyr has been quite good for us in that regard so far.
Would you support these changes in principle? Ed, Marti, and I would do all the work and make a PR but we don't want to work on it if you're not supportive of the idea.
Also tagging @PRijnbeek and @rossdwilliams for awareness.
Obviously, having a dedicated OhdsiSql backend in dbplyr would be great, but it would also be a lot of work, implying a lot of future maintenance (have you looked at the existing backend implementations?)
What problem would this solve compared to the current solution (piggy-backing on the SQL Server backend)?
so I think we would indeed piggy back of the SQL server backend but only for Sql server. We would use the postgres backend for postgres Etc. for the other dbms
So I don't think I'll need to implement much. Just make sure the appropriate backend gets used based on the dbms. This is what we do currently with other DBI drivers and we implemented very little. If there's not strong objection to the idea then I'd suggest I'll make fork and try to implement it to see if it works. Then you can review and provide feedback.
I'm not sure all the issues we run into with the current implementaion but the double sql translation means we get different sql depending on if users are connecting with DatabaseConnector or another DBI driver so there is just more places for potential issues. Currently some functionality works and some does not. And we need to tackle each issue one by one.
I think I'm misunderstanding what you're proposing. I thought you wanted to create a backend that generates OhdsiSql, which would then be translated into the various supported dialects using SqlRender. But it seems you want to create a backend that generates different SQL based on the DBMS?
Yea it's hard to explain on github. Here is the PR which might help https://github.com/OHDSI/DatabaseConnector/pull/258. Basically we would like to use DatabaseConnector as a database driver only and use dbplyr to generate SQL. The PR above is one way to do that but there could be other approaches that could work as well. We would like to skip SqlRender.
So instead of dbplyr -> sqlrender -> database
we do dbplyr -> database
DatabaseConnector handles delivering SQL to the database and returning the result but would not modify the dbplyr generated SQL which currently works for us.
But if I read the PR correctly, you don't need a DatabaseConnector backend? You just give the DatabaseConnector connection the class corresponding to the DBMS, so the correct dbplyr backend will be invoked?
Yes that is correct. I'm just extending your initial approach to other dbms and removing the need for the sqlrender::translate step.
I haven't tested this much yet though so I will need to make sure it works the way I expect it to. But I think you get the idea.
I think I understand now. This is what I also was trying to suggest here. We could keep the backend using SqlRender for platforms that don't yet have a dbplyr backend (if there are any left).
This definitely fits with the idea of making DatabaseConnector more lightweight, so I like it. Since this is a pretty big change, I think we should not work in the develop branch for now (which can still be used for quick patches). Should I create a separate branch where you can work on it, or do you want to keep working in the fork?
Ok great! Yes please create a separate branch for me to target with my pull request. I'll do a bit more testing and make sure the fork is working as I expect.
Ok, I've created https://github.com/OHDSI/DatabaseConnector/tree/dbplyr_backends, and merged your PR there.
We already have lots of unit tests for dbplyr, let's see how well they perform with this new approach.
May I have the ability to push directly to the OHDSI/DatabaseConnector:dbplyr_backends branch? This would help becuase then the full set of tests will run on my PR. If I make a PR from an external branch it does not have access to the environment variables on github and only a subset of tests run meaning that I cannot accurately ascertain if tests are actually passing on my PR until you merge to a branch that is in the OHDSI organization.
Also do you have an objection to changing the default value of translate on the DBI methods? If we can we use a global option to set this default? If not it will be a breaking change and the tests will need to be modified. Personally I think it would be better for the DBI methods not to translate but that is only my opinion and as long as I can opt of that with a global option for dplyr stuff it should be fine I think.
You should have write access now.
I would prefer the DBI functions to continue to translate by default when the user directly calls these functions. That fits the idea that DatabaseConnector is a universal DBI driver (code once, run on any platform).
I'm sure we can make it so it doesn't translate if the SQL is coming from one of the dbplyr backends.
Note that the changes to the odbc package Hadley mentioned are now (since 1.4.0) in effect. The various database classes are now formally exported, so we don't have to re-define them ourselves.
@ablack3 : have you had a change to work on this, as discussed? https://github.com/OHDSI/DatabaseConnector/discussions/275
Yes I have. I'm confused about method dispatch though because it isn't working as I expect it to.
I'll try to explain with a simple example.
In DatabaseConnector I added a method for the dbplyr_edition generic function.
#' @export
dbplyr_edition.DatabaseConnectorJdbcConnection <- function(con) {
2L
}
I regenerated package documentation and rebuilt the package. I can see that the method is exported in NAMESPACE.
I connect to a local database and check the class of the connection.
> library(DatabaseConnector)
> connectionDetails <- createConnectionDetails(dbms = "postgresql",
+ server = "localhost/eunomia",
+ port = 5432,
+ user = "postgres",
+ password = "postgres")
> write_schema <- "scratch"
> cdm_schema <- "cdm"
> con <- connect(connectionDetails)
Connecting using PostgreSQL driver
> class(con)
[1] "DatabaseConnectorJdbcConnection"
attr(,"package")
[1] ".GlobalEnv"
When I run dbplyr_edition my method is not being called.
> dbplyr_edition(con)
[1] 1
I used sloop to investigate the dispatch and it looks as if my method should be called.
I used debugonce to follow the execution of dbplyr_edition and I can see that the method I added is not being called.
If I load the method into the global environment it does get called as I expect.
I thought maybe this has do with using S3 methods on S4 classes but the connection object seems similar to the duckdb connection. However the similar method in the duckdb R package does get called correctly.
I think I need to understand why method dispatch isn't working the way I expect in DatabaseConnector. It's probably something simple but I need it to make sure the other dbplyr methods are dispatched correctly. And I also want to understand it. Any ideas? I might try making a small package with just the methods I'm working on.
Ah my problem was that I was not importing the generic into the package.