daff icon indicating copy to clipboard operation
daff copied to clipboard

Consider integrating with objectdiff

Open robertzk opened this issue 10 years ago • 2 comments

objectdiff

Wonder if there's anything we can collaborate on?

robertzk avatar Apr 18 '15 20:04 robertzk

Dear Robert,

Sorry for my late response: I was on a leave. I'd be happy to cooperate/integrate: do you have suggestions on what to integrate?

Best,

Edwin

2015-04-18 22:43 GMT+02:00 Robert Krzyzanowski [email protected]:

objectdiff https://github.com/robertzk/objectdiff

Wonder if there's anything we can collaborate on?

— Reply to this email directly or view it on GitHub https://github.com/edwindj/daff/issues/8.

edwindj avatar Apr 24 '15 07:04 edwindj

Thanks for the response edwin!

The way objectdiff works is it provides a function called objectdiff that computes a closure containing the "diff" between two arbitrary R objects. For example, if we have:

iris2 <- iris
iris2$new_column <- 1
patch <- objectdiff(iris, iris2)

Then patch will only store the new_column, rather than duplicating the full data set. This is particularly useful in wide data sets with hundreds or thousands of columns.

# Proof that the patch is smaller
> object.size(patch)
1896 bytes
> object.size(iris)
7088 bytes
> object.size(iris2)
8384 bytes

If you apply several modifications to a data.frame, you can start with only a copy of the initial set and its succession of patches to work your way to the final data.frame. This has two advantages: (1) you know what changed in each step, (2) it occupies much less memory.

Going further, objectdiff provides a tracked_environment that stores any changes to an R environment object using patches obtained from objectdiff. My question then is whether we can generate a plot of changes to, say, a data frame, by mapping patches obtained from objectdiff to plottable diffs obtained from daff.

Do you think this would be an interesting project? I could probably dedicate a weekend to it.

robertzk avatar Apr 24 '15 16:04 robertzk