dbx Ideas

trafficstars

Please create a new issue to discuss any ideas or share your own.

0.4.0 release

[ ] Make adapter first param of dbxConnect (to match dbConnect)
[ ] Prefer RMariaDB over RMySQL when adapter is mysql
[ ] Raise error when bigint value out of numeric range - bigint3 or bigint4 branch

Ideas

[ ] Auto-increment ids for MySQL/MariaDB - can use select last_insert_id(), row_count() https://stackoverflow.com/questions/7333524/how-can-i-insert-many-rows-into-a-mysql-table-and-return-the-new-ids

On hold

[ ] More efficient update query - use temp table (alternative approach for Postgres is update ... from link, but this requires typecasting)
[ ] Add retries option to dbxSelect - undo revert of dd9d716651f9321afe74def1fac491de4a4daca0 (users can use a separate package like retry for this)
[ ] Add statement_timeout option to dbxSelect and maybe other commands - probably not common
[ ] Make conn optional (use global connection if not passed)
[ ] Add a way to easily create tables - dbxCreateTable(db, table, records, types=list(id="bigserial primary key", score="float")) (only first 3 args required, types can be used to override defaults)
[ ] Add cache option to cache select statements to disk - cache branch (need ability to cache for duration and force refresh)
[ ] Run tests against BigQuery

Jun 08 '18 07:06 ankane

there could be dbx_logging option, similar to dbx_verbose but logs would goes into db table instead of stderr (as in case of message).

Jun 20 '18 15:06 jangorecki

if possible to handle those two functionalities in base R it would make dbx package lightweight.

importFrom(urltools,get_credentials)
importFrom(urltools,url_parse)

Jun 20 '18 15:06 jangorecki

Examples of using RPostgreSQL would be nice, it is much lighter than RPostgres. Not sure how it is currently but few years back I could not achieve few functionalities in RPostgres I was using in RPostgreSQL. There was some heavy dev to RPostgres in late 2017 so some of those could have been addressed by now. Anyway RPostgreSQL is battle tested 10 years old package so definitely should not be skipped. Not sure, but it may help, I have a package on RPostgreSQL doing upsert, etc: https://github.com/jangorecki/pg

Jun 20 '18 16:06 jangorecki

Hey @jangorecki, thanks for the ideas. I added support for RPostgreSQL. I agree with keeping the package lightweight and minimizing dependencies, but don't want to reinvent/maintain a URL parser. As for logging, you can now pass a function to dbx_verbose and do what you'd like with the logs.

writeLogs <- function(sql) {
  # your code
}

options(dbx_verbose=writeLogs)

Jun 20 '18 23:06 ankane

Quick follow-up: Made urltools optional since it's only needed if you use the url functionality. Didn't realize that it had two dependencies itself.

Jul 04 '18 08:07 ankane

It could be nice to add wrapper to load a CSV/TSV files. dbxCopy or dbxLoad for example.

Jan 05 '19 15:01 artemklevtsov

Hey @artemklevtsov, thanks for the suggestion. Since it only takes 1 line to convert a CSV/TSV into a data frame, I'm not sure a separate function is needed.

Mar 08 '19 20:03 ankane

I mean this function should use the effective COPY INTO instead INSERT.

Mar 09 '19 03:03 artemklevtsov

What would be nice is a way to add missing columns when trying to update, insert or upsert.

E.g. I'm getting data from a web service where new columns can crop up now and then, e.g. "label2" (this example is taken from the gitlab API). As far as I know there's currently no easy way to emulate data.table::rbind(fill = TRUE), i.e. to first do an ALTER TABLE to add the missing columns and then use UPDATE to push the data to the database.

Nov 03 '19 20:11 stefanfritsch

@stefanfritsch thanks for the suggestion. This use case doesn't seem very common, so I think it's best accomplished outside of dbx.

Nov 03 '19 22:11 ankane

Hi Ankane! I think a nice feature would be to have something like dbxRead() in which it is possible to provide a 'where' data frame, 'return_cols' (default: 'ALL') and a batch_size argument. E.g. I have tables with primary keys consisting of two or more columns. Hence dbxSelect with the 'params' argument does not work, for it can only handle only one column / a vector? Also the batch_size argument would be helpful for large tables. Cheers!

Mar 11 '20 19:03 csatzky

Hey @budfox123, thanks for the suggestions. I've considered adding a more ORM-type function for constructing queries (instead of writing SQL manually), but want to keep things simple for now. You can probably write a function that wraps dbxSelect to do this. If you do, please share.

Mar 17 '20 22:03 ankane

Hi @ankane, would it be possible to let dbxUpdate and dbxUpsert return the number of changed (or sum of changed and added) rows?

Sep 17 '20 13:09 psimm

Hey @psimm, both run dbExecute behinds the scenes (unless returning is passed to upsert), so it could probably sum them. Can you explain the use case a bit more?

Sep 28 '20 06:09 ankane

Hi @ankane thanks for your reply! Sorry, I think my previous suggestion of summing them was not a good idea. Here is my use case: I am downloading data from an API on a schedule. The API sometimes returns values that are already in the DB, so I use dbxUpsert. I am keeping logs of all interactions with the API. Ideally, dbxUpsert would let me know how many are new and how many are updated.

Sep 28 '20 07:09 psimm

If you're on Postgres and use the version from GitHub, you can do:

dbxUpsert(db, table, records, where_cols=c("id"), returning=DBI::SQL("id, (xmax = 0) AS inserted"))

I'm not sure if it's possible with the other databases.

Sep 28 '20 08:09 ankane

Awesome! Yes, I am using Postgres. Thank you for adding that.

Sep 28 '20 08:09 psimm

Error: Failed to fetch row: ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.

Aug 23 '21 13:08 SurajitShasmal

dbx dbx copied to clipboard

Ideas

dbx
dbx copied to clipboard