dbx
dbx copied to clipboard
Ideas
Please create a new issue to discuss any ideas or share your own.
0.4.0 release
- [ ] Make
adapterfirst param ofdbxConnect(to matchdbConnect) - [ ] Prefer RMariaDB over RMySQL when adapter is
mysql - [ ] Raise error when bigint value out of numeric range -
bigint3orbigint4branch
Ideas
- [ ] Auto-increment ids for MySQL/MariaDB - can use
select last_insert_id(), row_count()https://stackoverflow.com/questions/7333524/how-can-i-insert-many-rows-into-a-mysql-table-and-return-the-new-ids
On hold
- [ ] More efficient update query - use temp table (alternative approach for Postgres is
update ... fromlink, but this requires typecasting) - [ ] Add
retriesoption todbxSelect- undo revert ofdd9d716651f9321afe74def1fac491de4a4daca0(users can use a separate package like retry for this) - [ ] Add
statement_timeoutoption todbxSelectand maybe other commands - probably not common - [ ] Make
connoptional (use global connection if not passed) - [ ] Add a way to easily create tables -
dbxCreateTable(db, table, records, types=list(id="bigserial primary key", score="float"))(only first 3 args required, types can be used to override defaults) - [ ] Add
cacheoption to cache select statements to disk -cachebranch (need ability to cache for duration and force refresh) - [ ] Run tests against BigQuery
there could be dbx_logging option, similar to dbx_verbose but logs would goes into db table instead of stderr (as in case of message).
if possible to handle those two functionalities in base R it would make dbx package lightweight.
importFrom(urltools,get_credentials)
importFrom(urltools,url_parse)
Examples of using RPostgreSQL would be nice, it is much lighter than RPostgres. Not sure how it is currently but few years back I could not achieve few functionalities in RPostgres I was using in RPostgreSQL. There was some heavy dev to RPostgres in late 2017 so some of those could have been addressed by now. Anyway RPostgreSQL is battle tested 10 years old package so definitely should not be skipped. Not sure, but it may help, I have a package on RPostgreSQL doing upsert, etc: https://github.com/jangorecki/pg
Hey @jangorecki, thanks for the ideas. I added support for RPostgreSQL. I agree with keeping the package lightweight and minimizing dependencies, but don't want to reinvent/maintain a URL parser. As for logging, you can now pass a function to dbx_verbose and do what you'd like with the logs.
writeLogs <- function(sql) {
# your code
}
options(dbx_verbose=writeLogs)
Quick follow-up: Made urltools optional since it's only needed if you use the url functionality. Didn't realize that it had two dependencies itself.
It could be nice to add wrapper to load a CSV/TSV files. dbxCopy or dbxLoad for example.
Hey @artemklevtsov, thanks for the suggestion. Since it only takes 1 line to convert a CSV/TSV into a data frame, I'm not sure a separate function is needed.
I mean this function should use the effective COPY INTO instead INSERT.
What would be nice is a way to add missing columns when trying to update, insert or upsert.
E.g. I'm getting data from a web service where new columns can crop up now and then, e.g. "label2" (this example is taken from the gitlab API). As far as I know there's currently no easy way to emulate data.table::rbind(fill = TRUE), i.e. to first do an ALTER TABLE to add the missing columns and then use UPDATE to push the data to the database.
@stefanfritsch thanks for the suggestion. This use case doesn't seem very common, so I think it's best accomplished outside of dbx.
Hi Ankane! I think a nice feature would be to have something like dbxRead() in which it is possible to provide a 'where' data frame, 'return_cols' (default: 'ALL') and a batch_size argument. E.g. I have tables with primary keys consisting of two or more columns. Hence dbxSelect with the 'params' argument does not work, for it can only handle only one column / a vector? Also the batch_size argument would be helpful for large tables. Cheers!
Hey @budfox123, thanks for the suggestions. I've considered adding a more ORM-type function for constructing queries (instead of writing SQL manually), but want to keep things simple for now. You can probably write a function that wraps dbxSelect to do this. If you do, please share.
Hi @ankane, would it be possible to let dbxUpdate and dbxUpsert return the number of changed (or sum of changed and added) rows?
Hey @psimm, both run dbExecute behinds the scenes (unless returning is passed to upsert), so it could probably sum them. Can you explain the use case a bit more?
Hi @ankane thanks for your reply! Sorry, I think my previous suggestion of summing them was not a good idea. Here is my use case: I am downloading data from an API on a schedule. The API sometimes returns values that are already in the DB, so I use dbxUpsert. I am keeping logs of all interactions with the API. Ideally, dbxUpsert would let me know how many are new and how many are updated.
If you're on Postgres and use the version from GitHub, you can do:
dbxUpsert(db, table, records, where_cols=c("id"), returning=DBI::SQL("id, (xmax = 0) AS inserted"))
I'm not sure if it's possible with the other databases.
Awesome! Yes, I am using Postgres. Thank you for adding that.
Error: Failed to fetch row: ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.