dittodb Best approach for custom serializers?

Best approach for custom serializers?

Open jonkeane opened this issue 5 years ago • 0 comments

dputs of objects were chosen as the first serialization for a few reasons:

they are plain text so easily reviewable and understandable in git diffs
they serialize any sort of object
they can be used to reliably return a data.frame with specific column types

A few alternatives were not chosen:

CSV While these are plain text, and arguably easier to read than dput output, they would need some sort of sidecar file to make sure they are parsed correctly into data.frames and they couldn't be used to serialize non-data.frame objects. (missing 2 and 3 above)

RDS These can serialize anything (and reliably return data.frames but they don't satisfy (1) above since they are binary and not plain text.

For most objects the dput output is probably just fine, though for the result of large queries, we might want something that is easier to read and reason about (and ideally would behave better than writing and reading dput. One possible alternative serialization would be CSVY (e.g. https://cran.r-project.org/web/packages/csvy/index.html) but that depends on data.table which is a rather hefty dependency for serialization alone.

It should also be pointed out that the limitations of dput objects have a side effect of encouraging best practices when writing and using fixtures: one's fixture ought to be as minimal as possible to test what you need. dput objects work well (enough) for small objects and only start to fall down when there are large numbers of rows/columns.

There are a few options:

Suggest CSVY and optionally use it
Build functionality for people to provide their own, custom serializers for data.frame returning queries (similar to how httptest allows for custom redactors)
Leave everything as is

Feb 20 '20 23:02 jonkeane

dittodb dittodb copied to clipboard

Best approach for custom serializers?

dittodb
dittodb copied to clipboard