gap icon indicating copy to clipboard operation
gap copied to clipboard

Standardise ordering of members of records

Open ChrisJefferson opened this issue 6 years ago • 5 comments

The internal order in which members of records are stored is based on how RNams are ordered. This information varies depending on the order in which prec keys are defined, which is (in my opinion) very subtle, as it will often appear to users to be fixed, until they load some package (or update GAP, or their code) in a way which changes it.

To give example, consider these two GAP sessions, notice I define x and y in different orders in the two sessions:

gap> x := rec(aaa := 1, bbb := 5);
rec( aaa := 1, bbb := 5 )
gap> y := rec(bbb := 4, aaa := 3);
rec( aaa := 3, bbb := 4 )
gap> RecNames(x); RecNames(y);
[ "aaa", "bbb" ]
[ "aaa", "bbb" ]
gap> x < y;
true

(in a new GAP session)
gap> y := rec(bbb := 4, aaa := 3);
rec( aaa := 3, bbb := 4 )
gap> x := rec(aaa := 1, bbb := 5);
rec( aaa := 1, bbb := 5 )
gap> RecNames(x); RecNames(y);
[ "bbb", "aaa" ]
[ "bbb", "aaa" ]
gap> x < y;
false

Methods which are effected by this include:

  • < : Gives different values if keys are defined in a different order.
  • RecNames : Order of results varies depends on caller.

Print (as you can see above) is not effected, as it sorts the members of the record.

I feel this has, over the years, lead to subtle issues for our users, and should be fixed.

Print fixes this by internally sorting. I suggest we do the same in RecNames and <.

We can actually speed this up (at the cost of a small amount of memory) by attaching to each PREC an extra pointer to a PLIST, which would store the sorted keys of the record. We would construct this list whenever RecNames was called, and throw it away whenever it became invalid (so people who never want it will never have it constructed).

ChrisJefferson avatar Feb 06 '19 12:02 ChrisJefferson

Note: There is a similar problem in COMOBJ (as they are implemented in a similar way to PRECs). I'm tempted to ignore the problem there, because I think the ordering is quite hard for the user to see, and there are technical reasons to just handle PRECs (because of the way layouts are done, PRECs already have a free empty pointer, where the COMOBJ type is stored, which I could use for storing my records, meaning I won't need to allocate any more space when the ordering isn't used).

ChrisJefferson avatar Feb 06 '19 16:02 ChrisJefferson

I agree that the ordering of records should not depend on the order in which record names were introduced.

markuspf avatar Feb 07 '19 09:02 markuspf

The plan regarding RecNames sounds fine to me. But that leaves <. Serious question: Should this even be there? In an ideal world, I'd just ditch it... but then, probably something is using it. Might be instructive to see what happens if we turn it into an error, then run the test suite. Just to find out for what kind of stuff people use it.

fingolfin avatar Feb 08 '19 09:02 fingolfin

I use it quite a lot (indirectly, I want to make sets).

For example, cut+paste from a file I'm editing (I have more usefully named variables, I'm just removing my interesting names because they you'd guess what my paper was about)

Size(Set(List(stuff, {x} -> rec(prop1 := prop1(x), prop2 := prop2(x), prop3 := prop3(x)))));

Lets me find how many equivalence classes there are for my things, when I distinguish by 3 properties. Of course, I could have put this in a list, but I prefer a record, as it lets me label what each value is, rather than mapping them to integers and having to remember what they are

ChrisJefferson avatar Feb 08 '19 11:02 ChrisJefferson

Based on the related discussion on #4576, it seems there are strong reasons to leave RecNames untouched.

The leaves < for records. I've just verified that nothing in the testinstall test suite uses it except for tests in tst/testinstall/recordname.tst (of course), and one test in tst/testinstall/tilde.tst (not sure why... perhaps to test the "recursion counter"?!?). I am not sure if anyone beyond you, @ChrisJefferson is "using it a lot" in real world?

So all in all, I don't have serious reservation about making it slower if that helps. Heck, we could even replace the comparison in the kernel by one written in GAP (muuuch slower but also much easier to code).

fingolfin avatar Jul 06 '22 08:07 fingolfin