Larger Finite Fields, take 2
This is a rebased and somewhat reworked version of #2846. The biggest addition is that the lookup tables are computed at build time using etc/ffgen.c which has been somewhat updated. The decision about what size of internal finite fields to use on what architectures is made entirely in etc/ffgen.c, in principle it should be possible to go to 23 bits or 25 bits just by editing that programme.
I've done quite a bit of refactoring of the finite fields code in the process, mostly introducing inline functions and removing some duplication.
Still to do is updates to tests, documentation and comments.
I'd welcome @fingolfin looking over this, especially the changes to the makefiles, and @jdebeule to confirm I haven't done anything too stupid to his code.
Codecov Report
:exclamation: No coverage uploaded for pull request base (
master@1eb79a6). Click here to learn what that means. The diff coverage is100%.
@@ Coverage Diff @@
## master #2997 +/- ##
=========================================
Coverage ? 83.64%
=========================================
Files ? 688
Lines ? 336639
Branches ? 0
=========================================
Hits ? 281566
Misses ? 55073
Partials ? 0
| Impacted Files | Coverage Δ | |
|---|---|---|
| lib/ffe.g | 92.85% <ø> (ø) |
|
| lib/ffe.gd | 100% <ø> (ø) |
|
| lib/ffeconway.gi | 75.93% <100%> (ø) |
|
| hpcgap/lib/ffeconway.gi | 68.92% <100%> (ø) |
|
| src/finfield.c | 97.19% <100%> (ø) |
|
| src/finfield.h | 98.3% <100%> (ø) |
Working to clean up the test failures. Mostly straightforward, but there is a problem with tests in hpcgap/tst/testinstall/comprvec.tst like
gap> F:=GF(41^3);;
gap> v:=Filtered(F,x -> x in GF(41));;
gap> IS_VECFFE(v);
false
gap> IsFFECollection(v);
true
gap> z:=CopyToVectorRep(v,41);
< mutable compressed vector length 41 over GF(41) >
gap> z=v;
true
gap> IsIdenticalObj(z,v);
false
This fails now because Z(41,3) is internally represented. To avoid that I'd have to increase 3 to 5 and then the Filtered line takes a long time. I don't understand the purpose of these tests well enough to know what is a sensible change. Can anyone advise?
Also, there is a problem building NormalizInterface. It uses an m4 script ac_find_gap.m4 which doesn't take GAP_CPP_FLAGS from sysinfo.gap but instead rolls its own and doesn't include the gen directory, resulting n ffdata.h not being found (which is included indirectly by compiled.h). Not sure what the best way to fix this is.
If this is merged, we might want to add a virtual enumerator for Galois Fields instead of making a plain list.
@stevelinton
Working to clean up the test failures. Mostly straightforward, but there is a problem with tests in
hpcgap/tst/testinstall/comprvec.tstlike
This looks as if it tries taking making larger GF-elements that lie in the small prime field (presumably under the assumption that they are stored differently internally), and checks that compression into the small-field vector format works.
To make the exponent changeable, replace the Filtered with
gap> p:=PrimitiveElement(F)^((41^3-1)/(41-1));;
gap> Concatenation([0*p],List([0..39],x->p^x));
which writes down the same list without enumerating the large field.
The remaining test failures arise from interactions between factint and HPC-GAP and from the problems building NormalizInterface already mentioned.
Now just down to the normaliz-interface issues. Thinking about that, though, I wondered about where ffdata.c and ffdata.h should really go. At the moment they are in gen but they are architecture dependent (just on 32 vs 64 bits, not anything else) so should they actually be in bin/$(GAPARCH)?
The only difference I can see it making is that they might not need to be rebuilt if you switch back and forth between architectures and rebuild GAP each time, but I'm not sure of that.
Putting those files into gen is exactly the right place. There is no need for arch dependenat dirs, because now every build dir corresponds to precisely one build config; if one wants to build another binary with another config "in parallel" to the current one, one needs to use an out-of-tree build; or a separate clone.
However, the ffgen make target is not setup sufficiently for that, as it does not correctly track the configuration, and thus may not get rebuilt when it should be. A fix for that might be as simple as changing ffgen: etc/ffgen.c to ffgen: obj/etc/ffgen.lo, but I have not yet tried to do that; a slight variation might be necessary.
I have "looking at NormalizInterface issues with new FF code" on my TODO list, along with some general lovin' for NormalizInterface and possibly a release (CC: @sebasguts ) but it might take me a bit to get there, as I have a bunch of other GAP related tasks on it for today (such as fixing more of the fallout of the method reordering changes...)
@fingolfin Thanks a lot. Take your time. Although, before this PR gets too old, we need to decide whether to include it in 4.11, forcing the change on NormalizInterface or add some workaround, or just keep it on ice (with attendant risks of bitrot).
I'd like to avoid having it sit around for too long. I hope @sebasguts and me can make a NormalizInterface release soon. Once that has happened and was picked up, we can hopefully proceed with this PR. It would be nice to get some feedback from @jdebeule (whether he's OK with what was done to his PR ;-) first, too.
Since I had no time to work further on the original PR, I am very grateful that this was picked up! So I am ok with what has been done to the original PR.
I have now rebased this atop of PR #3137 (which should be merged first, and then we can rebase this PR once again). I also added the code by @jdebeule for regenerating the table of conway polynomials, though it is not yet fully hooked up (see the commit message of the WIP commit adding it for more details). Also, some of the changes in this PR were squashed by me into a single "WIP" commit -- I plan to break this up further later on, but for now, I wanted to make sure my changes so far are backed up and tested on GitHub ;-).
Coverage decreased (-7.2%) to 75.646% when pulling 27b0c49d5a1b74aac4a35d5942b46400422df89b on stevelinton:ff-ext into 7587224d230911bd713d2dcb49b2de51e85834d1 on gap-system:master.
Coverage increased (+1.4%) to 82.844% when pulling f33d435c0eae5adefe99acd4f7f2554fbf3ce4b8 on stevelinton:ff-ext into 5296012c784eb114511419e5f53e222b93918db1 on gap-system:master.
This has now been reduced to a relatively small set of changes (except for the about 7000 new generated lines in src/finfield_conway.h, and the removal of src/ffdata.c with another 2000 lines). I still need to clean up the remaining commits. Perhaps also break out the addition of etc/gen_conway.g into its own PR, and hook it into the build system so that regenerating that table becomes trivial.
One thing that concerns me a bit about this PR is that the generated file gen/ffdata.c in 64 builds is 26 MB. And the GAP executable grows from 2.3 MB (1.8 MB stripped) to 12 MB (11 MB stripped), i.e., growing by a factor of 6. Now of course this is still no problem on current machines, but I nevertheless find it a bit troubling.
@fingolfin Given that, as you say, it is not actually a problem on modern machines, I'm not inclined to worry about this. The root problem is that we need to number all the prime powers up to whatever our limit is, in a way that does not introduce race conditions, and allows reasonably quick lookup. It's not essential that the numbering be entirely dense, but we don't want to use more bits than we have to. Calculating it at startup (before we have multiple threads) is a bit slower than we'd like. Doing it on demand (so that the tables would be populated as far as the largest q so far considered) would be possible, but adds complexity. Loading the data at run-time (for instance via mmap) would be possible, but really is just hiding the problem.
One thing we might want to do is export this data to GAP level in some way, so that it is available for factoring, primality testing, etc.
In the somewhat unlikely event that someone did want to build a 64 bit version of GAP with minimal memory footprint then (a) their main job will be to minimize how much of the library they load and (b) they can easily configure it to use fewer than 24 bits worth of finite fields.