M2 Broken hash functions

The function hash seems to be broken for certain classes of objects:

MutableMatrix #997
MethodFunctionWithOptions

i44 : L = select(value \ values Core.Dictionary, x -> class x === MethodFunctionWithOptions);

i45 : unique \\ hash \ L

o45 = {242339741, 242343940}

o45 : List

i46 : #L

o46 = 212

Composed functions

Related:
```m2
i1 : hash(toList @@ hilbertFunction)

o1 = 1091563

i2 : hash(expression @@ net)

o2 = 1091563

This random collection of MethodFunctionSingles:

document, SYNOPSIS, makeDocumentTag, monoid, hooks

Most MethodFunctionBinarys

104 : L = select(value \ values Core.Dictionary, x -> class x === MethodFunctionBinary);

i105 : unique \\ hash \ L

o105 = {243950607, 242343937, 243946629}

o105 : List

i106 : #L

o106 = 8

There are more collisions that are likely just coincidence, but the above are likely bugs.

Sep 02 '22 13:09 mahrud

These collisions are important to prevent only if the functions are used as keys in hashtables. I think they never are.

Sep 03 '22 21:09 DanGrayson

Literally all of the methods above are used as keys in hash tables 🙃

Sep 04 '22 05:09 mahrud

Which hash tables?

Sep 04 '22 11:09 DanGrayson

Types.

Sep 04 '22 14:09 mahrud

Ah, right, I was focusing on the functions in your item #3 above, which are not.

A particularly egregious example is (buckets Module)#149, which has 29 entries in a linked list. Making the hash codes different might be a good way to avoid searching that list every time one of those methods is called. And a good way to do that would be to add a integer field to the line export FunctionClosure := {+ frame:Frame, model:functionCode };, initializing it from the value of nextHash().

Sep 04 '22 16:09 DanGrayson

Good catch!

Sep 04 '22 16:09 DanGrayson

A particularly egregious example is (buckets Module)#149, which has 29 entries in a linked list.

That isn't the worst by far:

i92 : max \\ length \ buckets typicalValues

o92 = 40

i93 : max \\ length \ buckets Attributes

o93 = 157

Even for types alone, there are 32 keys with the same hash in Matrix and 49 on Ideal:


i1 : c = first commonest tally apply(toList methods Ideal, hash)

o1 = 1367672348

i2 : select(methods Ideal, m -> hash m === c)

o2 = {0 => (analyticSpread, Ideal)          }
     {1 => (annihilator, Ideal)             }
     {2 => (ass1, Ideal)                    }
     {3 => (associatedPrimes, Ideal)        }
     {4 => (basis, Ideal)                   }
     {5 => (betti, Ideal)                   }
     {6 => (contractToPolynomialRing, Ideal)}
     {7 => (distinguished, Ideal)           }
     {8 => (doSplitIdeal, Ideal)            }
     {9 => (engineMGBF4, Ideal)             }
     {10 => (factorTower, Ideal)            }
     {11 => (flattenRing, Ideal)            }
     {12 => (gb, Ideal)                     }
     {13 => (generators, Ideal)             }
     {14 => (groebnerBasis, Ideal)          }
     {15 => (hilbertPolynomial, Ideal)      }
     {16 => (hilbertSeries, Ideal)          }
     {17 => (independentSets, Ideal)        }
     {18 => (integralClosure, Ideal)        }
     {19 => (integralClosures, Ideal)       }
     {20 => (inverseSystem, Ideal)          }
     {21 => (isLinearType, Ideal)           }
     {22 => (isPrimary, Ideal)              }
     {23 => (mingens, Ideal)                }
     {24 => (minimalBetti, Ideal)           }
     {25 => (minimalPresentation, Ideal)    }
     {26 => (minimalPrimes, Ideal)          }
     {27 => (minimalReduction, Ideal)       }
     {28 => (minprimesWithStrategy, Ideal)  }
     {29 => (multiplicity, Ideal)           }
     {30 => (normalCone, Ideal)             }
     {31 => (PD, Ideal)                     }
     {32 => (primaryDecomposition, Ideal)   }
     {33 => (prune, Ideal)                  }
     {34 => (quotient, Ideal)               }
     {35 => (radical, Ideal)                }
     {36 => (reesAlgebra, Ideal)            }
     {37 => (reesIdeal, Ideal)              }
     {38 => (regSeqInIdeal, Ideal)          }
     {39 => (regularity, Ideal)             }
     {40 => (resolution, Ideal)             }
     {41 => (saturate, Ideal)               }
     {42 => (specialFiber, Ideal)           }
     {43 => (specialFiberIdeal, Ideal)      }
     {44 => (splitIdeal, Ideal)             }
     {45 => (symmetricAlgebraIdeal, Ideal)  }
     {46 => (tangentCone, Ideal)            }
     {47 => (trim, Ideal)                   }
     {48 => (ZZp, Ideal)                    }

Broken hash on MethodFunctionWithOptions is responsible for all of them.

Sep 04 '22 21:09 mahrud

Can I ask you to rewrite all of evaluate.d and the method related stuff in actors5.d? I think I've done as best as I could optimizing methods.m2 but those are two of the largest files in the interpreter and I bet there are many many other bugs making methods slow that are so hard to find because of how messy the that part of the interpreter is.

hashtables.dd is not nearly as bad, but what's shocking about that file is that you are the only contributor to it.

Sep 04 '22 22:09 mahrud

I'll take a look ...

Sep 05 '22 12:09 DanGrayson

I see no reason to rewrite evaluate.d, actors5.d, and hashtables.dd -- I'd probably introduce more new bugs than I found. When I wrote the files for the first time, plenty of thought went into them -- rewriting them would be more slapdash.

Sep 06 '22 11:09 DanGrayson

You could at least go though and document or even just finish optimizing it: https://github.com/Macaulay2/M2/blob/ccb257c2ad7e909ecc174096079798f5fe062a41/M2/Macaulay2/d/evaluate.d#L912-L913 That was written 19 years ago, when lots of needs and assumptions were different.

Sep 06 '22 14:09 mahrud

I can't figure out what I meant by that comment, written 19 years ago.

Sep 06 '22 15:09 DanGrayson

As discussed in zulip, I think 1bcd8767f71929652979ae7f579c57d6d69536dd should be reverted. I'm not sure it solves any of the issues raised in this thread about methods (whose hashes are AFAICT not affected by this commit), but it certainly breaks Macaulay2 since it makes it impossible to run it for more than a couple of hours without getting

*** hash code serial number counter overflow (too many mutable objects created)

since every function definition will increase the hash counter.

Sep 11 '23 23:09 pzinn

It has definitely resolved the issue with methods:

i1 : max \\ length \ buckets typicalValues

o1 = 4

i2 : max \\ length \ buckets Attributes

o2 = 4

i3 : c = first commonest tally apply(toList methods Ideal, hash)

o3 = 190532608

i4 : # select(methods Ideal, m -> hash m === c)

o4 = 1

Though I agree that we should come up with a different fix than what Dan implemented. I think my preference is for the hash function of a function closure to depend on its code rather than just be sequential.

Sep 12 '23 00:09 mahrud

OK I see -- it has not changed the hash of actual methods -- but MethodFunctionWithOptions isn't a method. Those are now getting identical hashes. edited: after checking, MethodFunction is a descendant of CompiledFunctionClosure which is unaffected by the change. But MethodFunctionWithOptions is a descendant of FunctionClosure which is.

Sep 12 '23 01:09 pzinn

in case other people are wondering, the pre-commit situation was as follows: every FunctionClosure got a new hash number when it was parsed, according to convertr.d

     is a:Arrow do (
     	       p:=treePosition(a.lhs);
	       Code(functionCode(
	       unseq(c:=convert0(a.rhs)),a.desc,nextHash(),
	       combineLocation(p,codeLocation(c),position(a.Operator))
	       )))

(more precisely it's its code that got the hash, but the hash of a function was defined as the hash of its code). This means apply(10,i->hash((()->i))) would give 10 times the same hash number (the code of these 10 functions is the same, only the frame varies), whereas post-commit it will keep increasing, hence the massive hash number inflation I'm complaining about.

the pre-commit situation seems very reasonable to me -- the only problem is with MethodFunctionWithOptions and so on, because of the weird way that they're built. maybe there's a workaround?

the pre-commit situation is also not the same (and probably better) as my suggested radical change on zulip

is m:functionCode do return Expr(FunctionClosure(noRecycle(localFrame),m,123456789)) -- TEMP

which literally gives the same hash to every FunctionClosure.

Sep 12 '23 01:09 pzinn

Though I agree that we should come up with a different fix than what Dan implemented. I think my preference is for the hash function of a function closure to depend on its code rather than just be sequential.

I don't think that would work. the code is the same for all MethodFunctionWithOptions which is why it got the same hash pre-commit.

Sep 12 '23 02:09 pzinn

I'm happy with any solution that doesn't result in every MethodFunctionWithOptions installed for ideals ending up in the same bucket.

Sep 12 '23 04:09 mahrud

can we move the definition of MethodFunctionWithOptions, etc, to the d level? the only way I can think of at the moment.

Sep 12 '23 08:09 pzinn

Here is a general (proposed) strategy: have three different ways of generating hash codes.

hash code is assigned according to the counter
hash code equals (part of) the address in memory
hash code is computed (using the content of the object) Types would be 1. Mutable objects (other than Types) would be 2. (For instance, FunctionClosures) Immutable objects would be 3.

Sep 18 '23 20:09 antonleykin

Doug Torrance has volunteered to implement this idea. It will result in a pull request.

Sep 18 '23 20:09 DanGrayson

Currently, youngest checks the hash for several other types of things, and not just mutable hash tables:

https://github.com/Macaulay2/M2/blob/379c9634000dd87221dc353e1aaaca413f75c81c/M2/Macaulay2/d/actors4.d#L1323-L1326

The bottom 3 lines were added in https://github.com/Macaulay2/M2/commit/5598a530884459699328e225ebcc12e43c38ea13, which also added a few calls to youngest, two in the packageKey0 method (which no longer exists), and one for package(Sequence) (which has since been rewritten without using youngest).

Question: Do we need to worry about hashes playing nice with youngest for anything other than Type objects?

type	`youngest`	`serialNumber`
`CompiledFunction`
`CompiledFunctionClosure`	✓
`Database`
`Dictionary`		✓
`File`	✓
`FunctionClosure`
`FunctionBody`
`MutableHashTable`	✓	✓
`MutableList`		✓
`PythonObject`
`Symbol`	✓	✓
`Task`		✓

M2 M2 copied to clipboard

Broken hash functions

M2
M2 copied to clipboard