digest icon indicating copy to clipboard operation
digest copied to clipboard

Classes that sha1() should have methods for in digest

Open ThierryO opened this issue 10 years ago • 19 comments

  • Date
  • complex
  • array

ThierryO avatar Jan 04 '16 16:01 ThierryO

Already implement types:

  • complex converted to a 2-column matrix (Re, Im)
  • Date converted to an integer (positive/negative) representing number of seconds since 1970

Any ideas for the array type ?

vsimko avatar May 01 '16 12:05 vsimko

Same as for the other two? There is as.array() though I am not sure what one usually converts from...

eddelbuettel avatar May 01 '16 12:05 eddelbuettel

array is a multi-dimensional matrix. One of the problems is to produce different hashes for differently shaped arrays that contain the same data:

# the same content for differenly shaped arrays
content <- 1:8

# creates a cube 2x2x2
a <- array(content, dim = c(2,2,2))
b <- array(content, dim = c(2,4,1))

# structures to be hashed
a_tohash <- list(
    dim = dim(a),
    data = as.numeric(a)
)

b_tohash <- list(
    dim = dim(b),
    data = as.numeric(b)
)

# should produce different hashes
sha1(a_tohash)
sha1(b_tohash)
> sha1(a_tohash)
[1] "96dbde57a89f37cf8fbbf9e706047a3fc822e705"
> sha1(b_tohash)
[1] "d51b50b0a6414a453e909a5d74686156edb099ef"

vsimko avatar May 01 '16 12:05 vsimko

any other problems that need to be addressed before I create a pull request ?

vsimko avatar May 01 '16 12:05 vsimko

Ah, yes, of course.

Well a 1:8 sequence with dim(2,2,2) is different from dim(2,4,1) so I think this is good enough.

eddelbuettel avatar May 01 '16 13:05 eddelbuettel

Unit tests maybe (in the super-simple format we use here) ?

eddelbuettel avatar May 01 '16 13:05 eddelbuettel

Note: the current implementation assumes that: sha1(2+0i) != sha1(2) (conservative approach, but maybe mathematically incorrect)

vsimko avatar May 01 '16 13:05 vsimko

I've just found a related section in the vignette titled "Creating a sha1 method for other classes": https://github.com/eddelbuettel/digest/blob/master/vignettes/sha1.Rmd#creating-a-sha1-method-for-other-classes

Useful for new contributors.

vsimko avatar May 01 '16 14:05 vsimko

Any types that should also be implemented ?

vsimko avatar May 02 '16 18:05 vsimko

I've started this issue so everyone could add other classes to the wishlist. Keep in mind that @eddelbuettel wants to keep the list of dependencies as minimal as possible. That limits the classes to those available in base R.

ThierryO avatar May 02 '16 18:05 ThierryO

Yes, I think anything besides base R is likely to be out of scope. Unless we all agree that it is worth it.

But minimal dependencies are a Good Thing (TM) for a widely-used utility package such as this.

eddelbuettel avatar May 02 '16 18:05 eddelbuettel

what about zoo class for time series ?

vsimko avatar May 10 '16 07:05 vsimko

I love zoo and and xts -- but this would get a little out of hand.

We cannot possibly follow up to every other S3 class on CRAN. Base types is good.

eddelbuettel avatar May 10 '16 10:05 eddelbuettel

#41 add sha1 methods for the pairlist and name classes. It also adds documentation for the sha1 methods on array, complex and Date

ThierryO avatar May 25 '16 09:05 ThierryO

#54 adds a sha1 method for the "function" class. e.g. sha1(sum)

ThierryO avatar Jan 11 '17 14:01 ThierryO

Thanks for that. Looks like a very clean and straightforward PR. Will merge once Travis completes.

eddelbuettel avatar Jan 11 '17 14:01 eddelbuettel

The current implementation of sha1.function(x) takes environment(x) into account. This can lead to differences in sha1 hash when storing and retrieve the function in an S4 object. I am experiencing that problem now. A solution would be to update sha1.function(x) so that it ignores the environment. Another option is to use an extra argument which allows the user to should whether the environment should be taken into account. In that case sha1() should gain a ... argument. I'll make a PR after feedback from @eddelbuettel

ThierryO avatar Jan 20 '17 13:01 ThierryO

From the top of my head I like the extra argument idea better.

eddelbuettel avatar Jan 20 '17 13:01 eddelbuettel

Another pair of classes that should have a method: formula and environment?

Assuming that formula gets a method, I think that ideally there would be an option to ignore the attached environment (equivalent to sha1.function()).

billdenney avatar Oct 12 '19 21:10 billdenney