Remove support for float literals starting with a period [DO NOT MERGE]

Open fingolfin opened this issue 5 years ago • 12 comments

Before this patch, writing .0 instead of 0.0 was allowed, as in C, C++, D, Java, Julia, Go, Perl, Python and many other languages. This required a hack in the form of an additional entry pointer for our scanner. We now get rid of it, matching languages like Ada, Haskell, OCaml, Rust, Swift.

The main motivation for this is to reduce the API surface of the GAP scanner, and to remove the need for code using it to emulate the reader (or at least substantial parts of it) in order to provide a fully compliant tokenizer for GAP code. This should simplify external implementations of code formatters and other similar code which wants to "parse" GAP code somehow.

Usability should not really be impacted: first off, most floats don't get entered by hand anyway, and secondly, usage of floats in GAP still is a relative rare thing, I expect only few people use it. Of course some package might use float constants starting with a dot somewhere, but so far I have not found any. In any case, code using such constants should be trivial to adapt, and once adapted, the code will of course still work in older GAP releases just fine, too.

Right now, I can only think of two serious arguments against this (which of course just means that my imagination is limited):

some code somebody has might rely on reading a large number of floats from a text file, and some of those floats start with a dot. But I think this would be easy to resolve, e.g. by preprocessing the input with a python script (or plain old sed, plus a regex).
breaking changes to language are a major no-go. I'd generally agree with this; but I think floats in GAP are very much a niche feature only, and, as explained above, I expect it to break very little existing code, if any.

I'd be very much interested to hear what others think about this, esp. @laurentbartholdi.

Feb 11 '19 23:02 fingolfin

There's one drawback, which is that mpfr standardizes things by printing all floats with an initial ".": gap> SetFloats(MPFR); gap> 0.1; .1e0 gap> .1e0; .1e0 gap> 0.e0; .0e0 We can change that by always printing an initial "0", but I'd like to check that it doesn't snowball.

Out of curiosity, what is the plans about writing new (external) parsers?

On Tue, Feb 12, 2019 at 9:01 AM Steve Linton [email protected] wrote:

@stevelinton commented on this pull request.

This looks sensible to me. My original idea was to implement all C floating point literals (except the ones that were already integers), but that failed anyway because we wanted 2E6 to be a an identifier rather than a number.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gap-system/gap/pull/3280#pullrequestreview-202524751, or mute the thread https://github.com/notifications/unsubscribe-auth/ACIIUEOR6VS7vCLjgsesDeIFx4U7M1WJks5vMnTogaJpZM4a1PYc .

-- Laurent Bartholdi laurent.bartholdigmailcom Bureau GN1 Nord 472, tel. +33 4 72728722 École Normale Supérieure 46 allée d'Italie, 69364 Lyon Cedex 07

Feb 12 '19 08:02 laurentbartholdi

Coverage decreased (-0.0002%) to 93.729% when pulling 317a1fa56e4d368322df1cdb0fbbeccf55deafb8 on fingolfin:mh/remove-ScanForFloatAfterDotHACK into e62d39555838725badc50c716a29cdbf7cc576c4 on gap-system:master.

Feb 12 '19 09:02 coveralls

Can you give an example of what makes the initial . hard to recognise, or is it just because they were added later in a slightly hacky way?

Feb 12 '19 09:02 ChrisJefferson

The problem is how to tokenise something like

myrecord.17

without a clue from the parser level, this could be either a name followed by the float .17 or a name followed by a dot followed by a record component name 17.

The existing hack supplies that hint.

Feb 12 '19 10:02 stevelinton

@laurentbartholdi I don't see a major problem with that output. Sure, it's not again valid input, but that's the case for many objects (and any dreams to the contrary are doomed). That said, as you say, it's easy to correct, too....

I am not quite sure what you have in mind when you say that you'd "like to check that it doesn't snowball." ?

Feb 12 '19 10:02 fingolfin

I vaguely remember that in some cases the Float library uses the output string, and makes some assumption on the initial dot; I'd rather make sure that it was only for convenience that I did it that way, and that it won't cause lots of other problems that have to be fixed... whence a snowball. In all cases, if my fears are correct, then the new GAP will only be compatible with a new Float.

On Tue, Feb 12, 2019 at 11:32 AM Max Horn [email protected] wrote:

@laurentbartholdi https://github.com/laurentbartholdi I don't see a major problem with that output. Sure, it's not again valid input, but that's the case for many objects (and any dreams to the contrary are doomed). That said, as you say, it's easy to correct, too....

I am not quite sure what you have in mind when you say that you'd "like to check that it doesn't snowball." ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gap-system/gap/pull/3280#issuecomment-462705948, or mute the thread https://github.com/notifications/unsubscribe-auth/ACIIUEsVgfGuTBGTqKIsJQ8hNm13h0IQks5vMpgjgaJpZM4a1PYc .

-- Laurent Bartholdi laurent.bartholdigmailcom Bureau GN1 Nord 472, tel. +33 4 72728722 École Normale Supérieure 46 allée d'Italie, 69364 Lyon Cedex 07

Feb 12 '19 10:02 laurentbartholdi

I'm not super-enthusiastic about this, while I'd be happy to have never accepted .0, I'm a bit worried about breaking it now.

The rules around how ., and floats in general, are parsed are already quite messy and have lots of special cases, so I think they'd be hard to implement for another program.

Feb 12 '19 15:02 ChrisJefferson

@laurentbartholdi that sounds like very fragile code to begin with. In any case, I of course run the float test suite, which still passes -- but it's rather small, and e.g. does not seem to cover printing at all, nor many other functions, so I guess it passing does not mean much. That of course by itself is kind of a problem, too.

@ChrisJefferson the rules are actually not that messy. I refactored the code last year, and it's IMHO pretty clear now, though of course not exactly trivial; but that holds for any code parsing floats, as one has to deal with exponent notation and whatnot.

In any case, as I wrote, part of the motivation is precisely so that code does not have to reimplement the logic, but rather can directly lift the tokenizer (i.e. the code in scanner.c) from GAP and use it. Of course we are not quite there yet with this PR -- but I have another which moves the scanner state into an argument passed to the scanner, so it does no longer access the GAP state at all.

Feb 12 '19 16:02 fingolfin

But it's good that you made me look again -- I actually can simplify the code some more in this PR, by removing seenADigit.

Feb 12 '19 16:02 fingolfin

Codecov Report

Merging #3280 (317a1fa) into master (3af3ebb) will increase coverage by 11.42%. The diff coverage is 95.65%.

:exclamation: Current head 317a1fa differs from pull request most recent head 6dc3d5d. Consider uploading reports for the commit 6dc3d5d to get more accurate results

@@             Coverage Diff             @@
##           master    #3280       +/-   ##
===========================================
+ Coverage   82.07%   93.49%   +11.42%     
===========================================
  Files         684      716       +32     
  Lines      289913   812223   +522310     
  Branches     8615        0     -8615     
===========================================
+ Hits       237933   759358   +521425     
- Misses      50082    52865     +2783     
+ Partials     1898        0     -1898

Impacted Files	Coverage Δ
src/read.c	`97.32% <ø> (+1.14%)`	:arrow_up:
src/scanner.h	`25.00% <ø> (ø)`
src/scanner.c	`99.48% <95.65%> (-0.26%)`	:arrow_down:
lib/memory.gi	`43.12% <0.00%> (-39.92%)`	:arrow_down:
lib/methsel.g	`51.16% <0.00%> (-24.70%)`	:arrow_down:
lib/colorprompt.g	`46.15% <0.00%> (-24.16%)`	:arrow_down:
lib/matobjplist.gi	`42.79% <0.00%> (-19.72%)`	:arrow_down:
lib/gasman.gi	`26.66% <0.00%> (-18.27%)`	:arrow_down:
src/io.c	`59.48% <0.00%> (-17.80%)`	:arrow_down:
lib/mgmfree.gi	`58.53% <0.00%> (-16.47%)`	:arrow_down:
... and 366 more

Feb 12 '19 16:02 codecov[bot]

I regret that I did not remember to bring this up during the last GAP days.

Anyway, I might whip up a PR which does not disable this, just prints a warning message whenever floats starting with a dot are used. That could be used to flush out any related issues in the float package. As I said, i already tried to find issues with this change inside the float package, but apparently just running the test suite is not enough for that (which is a concern of its own).

Frankly, the whole state of float support in GAP is a bit depressing to me. Most things are not documented (including whether we support parsing .123 or not, as far as I can tell); that makes it super hard to debug all kinds of things, as it's often unclear whether some behavior is intentional or a bug. Hrmpf.

Apr 06 '19 13:04 fingolfin

@laurentbartholdi the motivation is discussed at length in this PR (which is marked as "DO NOT MERGE", don't worry at this time). If we ever again think making a bigger and partially breaking release of GAP, this is a candidate for inclusion.

Mar 28 '22 23:03 fingolfin

gap gap copied to clipboard

Remove support for float literals starting with a period [DO NOT MERGE]

Codecov Report

gap
gap copied to clipboard