dittodb
dittodb copied to clipboard
Best approach for custom serializers?
dput
s of objects were chosen as the first serialization for a few reasons:
- they are plain text so easily reviewable and understandable in git diffs
- they serialize any sort of object
- they can be used to reliably return a data.frame with specific column types
A few alternatives were not chosen:
CSV
While these are plain text, and arguably easier to read than dput
output, they would need some sort of sidecar file to make sure they are parsed correctly into data.frames and they couldn't be used to serialize non-data.frame objects. (missing 2 and 3 above)
RDS These can serialize anything (and reliably return data.frames but they don't satisfy (1) above since they are binary and not plain text.
For most objects the dput
output is probably just fine, though for the result of large queries, we might want something that is easier to read and reason about (and ideally would behave better than writing and reading dput
. One possible alternative serialization would be CSVY (e.g. https://cran.r-project.org/web/packages/csvy/index.html) but that depends on data.table
which is a rather hefty dependency for serialization alone.
It should also be pointed out that the limitations of dput
objects have a side effect of encouraging best practices when writing and using fixtures: one's fixture ought to be as minimal as possible to test what you need. dput
objects work well (enough) for small objects and only start to fall down when there are large numbers of rows/columns.
There are a few options:
- Suggest
CSVY
and optionally use it - Build functionality for people to provide their own, custom serializers for data.frame returning queries (similar to how
httptest
allows for custom redactors) - Leave everything as is