nodbi
nodbi copied to clipboard
Document DBI connector for R
nodbi
nodbi
is an R package that provides a single interface for several
NoSQL databases and databases with JSON functionality, with the same
function parameters and return values across all database backends.
Currently, nodbi
supports the following database backends:
- MongoDB
- SQLite
- Elasticsearch
- CouchDB
- PostgreSQL (since v0.6.0)
for an R
object of any of these data types:
- data.frame
- list
- JSON string
- a file name or URL with NDJSON records*
and for executing the following operations:
- List
- Exists
- Create
- Get
- Query**
- Update**
- Delete
across all database backends. Limitations: * Only http(s)
and only
for docdb_create
. **Only simple queries (e.g. equality for a single
field) and updates are supported for Elasticsearch at the moment. Only
root fields can be specified for CouchDB, whereas subitems can be
specified in dot notation for Elasticsearch, MongoDB, SQLite and
PostgreSQL.
For capabilities in terms of operations and parameter combinations across any of the database backends, see section walk-through below and see the main file for package testing, here: core-nodbi.R.
Install
CRAN version
install.packages("nodbi")
Development version
remotes::install_github("ropensci/nodbi")
Load package from library
library("nodbi")
API overview
Parameters for docdb_*()
functions are the same across all database
backends. See walk-through below.
Purpose | Function call |
---|---|
Create database connection (see below) | src <- nodbi::src_{mongo, sqlite, couchdb, elastic}(<see below for parameters>) |
Load my_data (a data frame, a list, a JSON string, or a file name or url with NDJSON records) into database, container my_container |
nodbi::docdb_create(src = src, key = "my_container", value = my_data) |
Get all documents back into a data frame | nodbi::docdb_get(src = src, key = "my_container") |
Get documents selected with query (as MongoDB-compatible JSON) into a data frame | nodbi::docdb_query(src = src, key = "my_container", query = '{"age": 20}') |
Get selected fields (in MongoDB compatible JSON) from documents selected query | nodbi::docdb_query(src = src, key = "my_container", query = '{"age": 20}', fields = '{"name": 1, "_id": 0, "age": 1}') |
Update (patch) selected documents with new data in data frame, list or JSON string from my_data |
nodbi::docdb_update(src = src, key = "my_container", value = my_data, query = '{"age": 20}') |
Check if container exists | nodbi::docdb_exists(src = src, key = "my_container") |
List all containers in database | nodbi::docdb_list(src = src) |
Delete document(s) in container | nodbi::docdb_delete(src = src, key = "my_container", query = '{"age": 20}') |
Delete container | nodbi::docdb_delete(src = src, key = "my_container") |
Close and remove database connection | rm(src) |
Database connections
Overview on parameters and aspects that are specific to the database
backend. These are only needed once: for src_*()
to create a
connection object. The connection object is subsequently used with
docdb_*
functions.
MongoDB
MongoDB but none of the other databases require to specify the container
already in the src_*()
function. “Container” refers to a MongoDB
collection.
nodbi::src_mongo(
collection = "my_container", db = "my_database",
url = "mongodb://localhost", ...)
SQLite
The functionality to process JSON is based on the SQLite extension JSON1, available in RSQLite. “Container” refers to an SQLite table.
nodbi::src_sqlite(dbname = ":memory:", ...)
CouchDB
“Container” refers to a CouchDB database. Function
docdb_update.src_couchdb
uses
jqr to implement patching
JSON, in analogy to functions available for the other databases.
nodbi::src_couchdb(
host = "127.0.0.1", port = 5984L, path = NULL,
transport = "http", user = NULL, pwd = NULL, headers = NULL)
Elasticsearch
“Container” refers to an Elasticsearch index. Only lowercase is accepted for container names. Opensearch can equally be used.
nodbi::src_elastic(
host = "127.0.0.1", port = 9200L, path = NULL,
transport_schema = "http", user = NULL, pwd = NULL, force = FALSE, ...)
PostgreSQL
“Container” refers to an PostgreSQL table. With PostgreSQL, the order of
variables in data frames returned by docdb_get()
and docdb_query()
can differ from the order in which they were in docdb_create()
.
nodbi::src_postgres(
dbname = "my_database", host = "127.0.0.1", port = 5432L, ...)
Walk-through
This example is meant to show how functional nodbi
is at this time.
# load nodbi
library(nodbi)
# connect database backend; this
# example fully works with any of
src <- src_postgres()
src <- src_mongo()
src <- src_sqlite()
#
# parts of the example do not
# yet work these, see *notes*
src <- src_elastic()
src <- src_couchdb()
# load data (here data frame, alternatively list or JSON)
# into the container "my_container" specified in the "key" parameter
docdb_create(src, key = "my_container", value = mtcars)
#> [1] 32
# load additionally 98 NDJSON records
docdb_create(src, key = "my_container", "http://httpbin.org/stream/98")
#> Note: container 'my_container' already exists
#> [1] 98
# load additionally contacts JSON data, from package nodbi
docdb_create(src, key = "my_container", contacts)
#> Note: container 'my_container' already exists
#> [1] 5
# get all documents, irrespective of schema
dplyr::tibble(docdb_get(src, "my_container"))
#> A tibble: 135 × 27
#> `_id` isActive balance age eyeColor name email about registered tags friends url
#> <chr> <lgl> <chr> <int> <chr> <chr> <chr> <chr> <chr> <list> <list> <chr>
#> 1 5cd678530d… TRUE $2,412… 20 blue Kris… kris… Sint… 2017-07-1… <chr> <df> NA
#> 2 5cd678531b… FALSE $3,400… 20 brown Rae … raec… Nisi… 2018-12-1… <chr> <df> NA
#> 3 5cd6785325… TRUE $1,161… 22 brown Pace… pace… Eius… 2018-08-1… <chr> <df> NA
#> 4 5cd6785335… FALSE $2,579… 30 brown Will… will… Null… 2018-02-1… <chr> <df> NA
#> 5 5cd67853f8… FALSE $3,808… 23 green Lacy… lacy… Sunt… 2014-08-0… <chr> <df> NA
#> 6 6529d28a-c… NA NA NA NA NA NA NA NA <NULL> <NULL> http…
#> 7 6529d2a8-c… NA NA NA NA NA NA NA NA <NULL> <NULL> http…
#> 8 6529d2b2-c… NA NA NA NA NA NA NA NA <NULL> <NULL> http…
#> 9 6529d2c6-c… NA NA NA NA NA NA NA NA <NULL> <NULL> http…
#> 10 6529d2d0-c… NA NA NA NA NA NA NA NA <NULL> <NULL> http…
#> # … with 125 more rows, and 15 more variables: args <named list>, headers <df[,4]>,
#> # origin <chr>, id <int>, mpg <dbl>, cyl <dbl>, disp <dbl>, hp <dbl>, drat <dbl>, wt <dbl>,
#> # qsec <dbl>, vs <dbl>, am <dbl>, gear <dbl>, carb <dbl>
# query some documents
# *note*: such complex queries do not yet work with src_elasticsearch()
docdb_query(src, "my_container", query = '{"mpg": {"$gte": 30}}')
#> _id mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 Fiat 128 32 4 79 66 4.1 2.2 19 1 1 4 1
#> 2 Honda Civic 30 4 76 52 4.9 1.6 19 1 1 4 2
#> 3 Lotus Europa 30 4 95 113 3.8 1.5 17 1 1 5 2
#> 4 Toyota Corolla 34 4 71 65 4.2 1.8 20 1 1 4 1
# query some fields from some documents; 'query' is a mandatory
# parameter and is used here in its position in the signature
# *note*: such complex queries do not yet work with src_elasticsearch()
docdb_query(src, "my_container", '{"mpg": {"$gte": 30}}', fields = '{"wt": 1, "mpg": 1}')
#> wt mpg
#> 1 2.2 32
#> 2 1.6 30
#> 3 1.5 30
#> 4 1.8 34
# query some subitem fields from some documents
# (only simple queries so far implemented for Elasticsearch)
# (only root, not subitems so far implemented for CouchDB)
# *note*: such complex queries do not yet work with src_elasticsearch()
str(docdb_query(src, "my_container", '
{"$or": [{"age": {"$lt": 21}},
{"friends.name": {"$regex": "^B[a-z]{3,6}.*"}}]}',
fields = '{"age": 1, "friends.name": 1}'))
#> 'data.frame': 4 obs. of 2 variables:
#> $ age : int 20 20 22 23
#> $ friends:'data.frame': 4 obs. of 1 variable:
#> ..$ name:List of 4
#> .. ..$ : chr "Pace Bell"
#> .. ..$ : chr "Yang Yates" "Lacy Chen"
#> .. ..$ : chr "Baird Keller" "Francesca Reese" "Dona Bartlett"
#> .. ..$ : chr "Wooten Goodwin" "Brandie Woodward" "Angelique Britt"
# such queries can also be used for updating (patching) selected documents
# with a new 'value'(s) from a JSON string, a data frame or a list
docdb_update(src, "my_container", value = '{"vs": 9, "xy": [1, 2]}', query = '{"carb": 3}')
#> [1] 3
# *note*: such queries do not yet work with src_elasticsearch()
docdb_query(src, "my_container", '{"carb": {"$in": [1,3]}}', fields = '{"vs": 1}')[[1]]
#> [1] 1 1 1 1 9 9 9 1 1 1
docdb_get(src, "my_container")[126:130, c(1, 27, 28)]
#> _id carb xy
#> 126 Merc 280C 4 NULL
#> 127 Merc 450SE 3 1, 2
#> 128 Merc 450SL 3 1, 2
#> 129 Merc 450SLC 3 1, 2
#> 130 Pontiac Firebird 2 NULL
# use with dplyr
# *note* that dplyr includes a (deprecated) function src_sqlite
# which would mask nodbi's src_sqlite, so it is excluded here
library("dplyr", exclude = c("src_sqlite", "src_postgres"))
#
docdb_get(src, "my_container") %>%
group_by(gear) %>%
summarise(mean_mpg = mean(mpg))
#> # A tibble: 4 × 2
#> gear mean_mpg
#> <dbl> <dbl>
#> 1 3 16.1
#> 2 4 24.5
#> 3 5 21.4
#> 4 NA NA
# delete documents; query is optional parameter and has to be
# specified for deleting documents instead of deleting the container
# *note*: such complex queries does not yet work with src_couchdb() or src_elasticsearch()
docdb_delete(src, "my_container", query = '{"$or": {"gear": 5, "age": {"$gte": 22}}}')
#> TRUE
nrow(docdb_get(src, "my_container"))
#> [1] 127
# delete container from database
docdb_delete(src, "my_container")
#> [1] TRUE
Benchmark
library("nodbi")
srcMongo <- src_mongo()
srcSqlite <- src_sqlite()
srcElastic <- src_elastic()
srcCouchdb <- src_couchdb(user = Sys.getenv("COUCHDB_TEST_USER"), pwd = Sys.getenv("COUCHDB_TEST_PWD"))
srcPostgres <- src_postgres()
key <- "test"
query <- '{"clarity": "SI1"}'
fields <- '{"cut": 1, "_id": 1, "clarity": "1"}'
value <- '{"clarity": "XYZ", "new": ["ABC", "DEF"]}'
data <- as.data.frame(diamonds)[1:12000, ]
testFunction <- function(src, key, value, query, fields) {
docdb_create(src, key, data)
# Elasticsearch needs a delay to process the data
docdb_create(src, key, "http://httpbin.org/stream/89")
# Elasticsearch needs a delay to process the data
if (inherits(src, "src_elastic")) Sys.sleep(1)
head(docdb_get(src, key))
docdb_query(src, key, query = query, fields = fields)
docdb_update(src, key, value = value, query = query)
docdb_delete(src, key)
}
#> 2022-05-04 with 2015 mobile hardware
#> without any database optimisations
rbenchmark::benchmark(
MongoDB = testFunction(src = srcMongo, key, value, query, fields),
RSQLite = testFunction(src = srcSqlite, key, value, query, fields),
Elastic = testFunction(src = srcElastic, key, value, query, fields),
CouchDB = testFunction(src = srcCouchdb, key, value, query, fields),
PostgreSQL = testFunction(src = srcPostgres, key, value, query, fields),
replications = 10L,
columns = c('test', 'replications', 'elapsed')
)
#> test replications elapsed
#> 4 CouchDB 10 1639
#> 3 Elastic 10 183 # 10s to be subtracted
#> 1 MongoDB 10 74
#> 5 PostgreSQL 10 81
#> 2 RSQLite 10 76
Testing
testthat::test_local()
# ✔ | F W S OK | Context
# ✔ | 84 | couchdb [90.8s]
# ✔ | 1 62 | elastic [68.9s]
# ────────────────────────────────────────────────────────────────────────────────────────────────────────────
# Skip (core-nodbi.R:138:3): docdb_query
# Reason: queries need to be translated into elastic syntax
# ────────────────────────────────────────────────────────────────────────────────────────────────────────────
# ✔ | 84 | mongodb [4.4s]
# ✔ | 83 | postgres [7.2s]
# ✔ | 84 | sqlite [4.2s]
#
# ══ Results ═════════════════════════════════════════════════════════════════════════════════════════════════
# Duration: 175.5 s
#
# ── Skipped tests ──────────────────────────────────────────────────────────────────────────────────────────
# • queries need to be translated into elastic syntax (1)
#
# [ FAIL 0 | WARN 0 | SKIP 1 | PASS 397 ]
Notes
- Please report any issues or bugs.
- License: MIT
- Get citation information for
nodbi
in R doingcitation(package = 'nodbi')
- Please note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
- Support for redis has been removed since version 0.5, because no way was found to query and update specific documents in a container.