emily More complicated numbers?

I'm going to accept PR #8 tonight, this adds scientific notation (1e8, etc) to Emily. However I have some thoughts on where we should go past there.

Something strikes me as odd about #8: The accepted format is 3.4e-8. This makes sense as it is standard and what the output looks like. However, -8 is not currently accepted as a number; it's ~8 to prevent parser confusion between unary and binary -. We don't however accept 3.4e~8.

Meanwhile, there's some more things I want eventually:

I want an octal mode (0o434)
I want a hex mode (0x4d4)
I want a binary mode (0b1001)
Once there are multiple numeric types (ie int) I want to specify some standard way of making a constant an int or a float, similar to 5.0f in C
I want octal, hex and binary mode to work with all features. I should be able to say something like 0x4ac.eb3Ea. Or 53.0b1001. Currently I think C doesn't let you put anything but decimal to the right side of the .

These other requests create additional weirdness; You obviously can't do 0xeE4 because E, and "f" can't go after a hexidecimal number, because "f" and "e" are valid hex digits.

Another thing to consider: something which is not currently well documented is the relationship between the "reader" (tokenizer.ml) and the macro parser (macro.ml). My goal is "as much as possible" should be handed off to the macro parser, since this will eventually be user code and I want it to be customizable. However it seems like some things kind of can't be handled by macros easily. The internals of numbers are a prime example: with 4.5e4, you'll probably (?) misrepresent the number in binary if you try to implement e as a macro. Negation however comes with no loss-of-precision risk, so ~ or an eventual unary - could be done with a user macro.

With all these things in mind, here's what I think I want:

IN READER: Numbers begin with a digit. If this then proceeds to 0x, 0b, or 0o, the reader switches to hex, binary or octal mode. after the numeric part an "e" or a "p" is allowed (p for power), either of which means the part after is an exponent. The "e" is not accessible in hex mode. The exponent may be negative using either a - or a ~
IN MACRO PROCESSOR: Either ~ or unary -, depending on which macro you loaded, negates the following numeric constant. Numbers are float by default but putting an "i" after, like 4i, makes it integer-typed to start.

VERY long term goals: Maybe eventually numeric constants are internally strings until some point in the macro processing loop, and that would allow bignum constants. It would be cool to have a way to specify binary strings like 102.244.33.53.1 or something. Oh, and I've always liked the _ in Perl.

Anyone have any thoughts? This is a little complicated but IMO the numeric inlines in most numbers are not nearly as nice as they should be, and this has to be thought out in a way other Emily features don't because it partially involves the Reader (i.e.: I can't just get it wrong and then patch it later with a user macro).

May 29 '15 17:05 mcclure

So, my (end-user) thoughts on this are:

I am really excited to get scientific notation support, as well as support for actual integers. (Are we talking 64-bit signed ints? 32-bit signed? Something else?)
I don't think i is a good suffix, in light of possible future complex number support.
Is your proposal opening up the door to numbers like -33 being valid Emiliy syntax? If not, I think we should require ~ for negative exponents too. I think seeing ~13e-3 is in some ways worse than ~13e~3 if only because I would probably constantly try to use a prefix minus too.
However numbers are printed/displayed by default should be valid Emily syntax IMO. Having to use one notation to read numbers and another to write them seems perverse.
I like the idea of being able to specify floating point constants in hex/octal/binary notation. I think using p makes sense here. Maybe it would be worth switching to p everywhere for consistency?
I think overall I like your plan, although I'm a bit nervous about constraining numeric syntax to something that is easy for a macro to deal with. As you know I care a ton about numbers, so maybe I'm biased, but I think it's important to be able to support things like complex numbers, fractions, etc. I guess I trust you, but just want to make my unease explicit.

May 29 '15 19:05 non

@mcclure I agree with the idea that things should be extensible for the user, but I will note that I could, for the time being, add support directly in the reader for numeric literals in all those formats in probably under an hour since float_of_string already handles them correctly.

I'll try to think more about how I think the extensibility should be handled tonight, but my first thought is that we could allow a really forgiving number syntax and then allow the macro stage to decide what the number should be transformed into. Perhaps anything that starts with a digit will get treated as a number by the reader? This would immediately work for the different literals starting with 0o, 0x, 0b, etc.

Responding to / agreeing with @non's points:

I also really want integers so I can stop performing abominations like storing file handles in floats in my experimental file IO.
We can just go all electrical engineer / Python here and use j for complex numbers (please let's not actually do this).
Changing the macros for operators is possible (I've been writing parsers for operators with precedence and stuff for just a few months now, but I know how it could be done if we don't need to be too flexible with how operators behave) and so I think the longterm goal should be to make unary -.
Yeah. emily -e "println: ~1" prints -1.
I would also accept this. (How can I write my base64 number literals with scientific notation? :smile: )
I also agree that numbers are very important, which is also a good reason that we should try to allow users to make numbers that are as beautiful as things that the language can natively support.

May 29 '15 21:05 porglezomp

Actually doing - for negative nums should be fairly easy. Just make it so that the macro automatically places a zero before the minus if there is nothing there. That makes it fairly easy. The problem is that by defining minus on an object then automatically gives it a negative, therefore, it would be better to have a .negate method on nums that is produced when the macro does not see a zero, similar to how ~ works now. That leaves room for extensibility, but still allows unary -.

May 30 '15 04:05 FayeAlephNil

Yeah, you can determine whether to add a negate or a subtraction based on the context. If the operator is directly after another operator, or after nothing, then it's a negation instead of a subtraction.

May 30 '15 04:05 porglezomp

"I am really excited to get scientific notation support, as well as support for actual integers. (Are we talking 64-bit signed ints? 32-bit signed? Something else?)"

I don't know yet. I think my redline is we shouldn't have BOTH int and float types until it is possible to type variables to make sure they're not accidentally intermixed. I just put "float" and "int" (rn only tests for floats with no fractional part) into the stdlib so maybe basic types are coming sooner than I thought!

"I don't think i is a good suffix, in light of possible future complex number support."

Ouch good call :(

"Is your proposal opening up the door to numbers like -33 being valid Emiliy syntax? If not, I think we should require ~ for negative exponents too. I think seeing ~13e-3 is in some ways worse than ~13e~3 if only because I would probably constantly try to use a prefix minus too."

~ is a macro. I think that "- supporting unary minus" should also be a macro. Maybe it should be the default macro! I don't know! What I was intending to do before I suddenly had all this interest in help :) was start by adding user macros, and then have two macro sets, one that looks like the current syntax with ~s and ^s, and another that looks more like ocaml. And then just play with it. I was gonna put unary minus in the latter one. Remember unary minus is really hard in emily because the language approach means - can't easily be unary sometimes and binary sometimes. There is no way for "sin -3" to be treated as a sine followed by a number and not the sine function minus a number.

A problem here is, again, the exponent feature lives in the reader/lexer and not at the macro level. So macros could mean that ~ or - are acceptable for prefix positive/minus. But macros can't control whether ~ and - are used within a numeric constant. The only way to support both depending on mode is to put both in both modes.

If you can think of a way out of this trap I'm happy to hear it.

I do think, if Emily is designed to write software which is used by users rather than being designed by programmers, when we convert numbers to strings, we should use - and not ~.

"I like the idea of being able to specify floating point constants in hex/octal/binary notation. I think using p makes sense here. Maybe it would be worth switching to p everywhere for consistency?"

If everyone's okay with it, I'd tend to prefer it.

"I guess I trust you, but just want to make my unease explicit."

If I can ease your unease somehow I'm happy to but please see my concerns above about how non-macro unary minus would even be implemented :O

" I will note that I could, for the time being, add support directly in the reader for numeric literals in all those formats in probably under an hour since float_of_string already handles them correctly."

That would be cool :O I've been putting it off because I've been meaning to do a big rewrite of the lexer all together to support reader macros and unicode better. If you want to take on some of this stuff feel free!! :O

"Perhaps anything that starts with a digit will get treated as a number by the reader?"

That's sensible, but what about negative numbers, then?

"we could allow a really forgiving number syntax and then allow the macro stage to decide what the number should be transformed into"

One way to do this would be to have there be some kind of intermediate "numberstring" type which macros are responsible for converting into actual numbers. Something I've been considering, once macro triggers can be words and not just numbers, is the possibility of a macro which matches "everything", not just specific key words/symbols. Why this is interesting to me: At the moment there's a thing where standalone symbols are interpreted as "read this atom from scope". They get transformed into a Token.Word in the tokenizer's internal ast. However, imagine if there were something running at a particular macro priority level which swept through and transmuted every stray symbol (anything that hasn't already been swept up as a binding by ^ or an atom by .) into a Token.Word. This would mean you could do much more unusual things with macros, like create a macro set where lone symbols are treated as atoms with no . needed. This is… kinda scary! But it would create space for macros to also decide how number constants are parsed.

"I also really want integers so I can stop performing abominations like storing file handles in floats in my experimental file IO."

Lua and JS people do this kinda stuff all the time! They're doubles, you have 32 bits of precision ^_^

"We can just go all electrical engineer / Python here and use j for complex numbers (please let's not actually do this)."

Maybe I wouldn't mind :D

"Yeah. emily -e "println: ~1" prints -1."

I hate that dot, by the way. >_>

"I would also accept this. (How can I write my base64 number literals with scientific notation? :smile: )"

I wanna be as inclusive as possible! I think an inline syntax for byte strings would be cool. I really like how some older versions of C let you specify uint32 constants like 'FACE'.

"Yeah, you can determine whether to add a negate or a subtraction based on the context. If the operator is directly after another operator, or after nothing, then it's a negation instead of a subtraction."

Yeah! That's easy and could be done with the current system. But wow am I worried about making it the default D: I guess I'm the only one who fears the "sin -3" problem more than "gosh durn it, what is ~" tho ^_^;

May 30 '15 20:05 mcclure

Something which might interest you is how my language, Firth, handles negation. I don't like the hack that many C-like languages have, which is to actually have no means to specify a number's sign, and make - be both negation and subtraction. This means that when you do want a negative number, it must be represented as a positive number immediately negated, which causes problems with operator precedence. It also means you need hacks to deal with number formats where the maximum and minimum values have differing magnitudes.

Firth, instead, makes negation, subtraction and sign completely separate:

12 is an integer literal
-12 is an integer literal that is negative, the - is detected by the lexer
0 12 sub. subtracts twelve from zero
12 neg. negates twelve

What Firth does maybe can't be done so cleanly in Emily, though. For starters, Firth doesn't have operator precedence, because it works on a stack.

Could you separate them in Emily? AIUI, your current set in Emily is:

binary - (subtraction)
unary ~ (negation)

I think that might be a bit misleading, because ~ traditionally represents bitwise NOT, which isn't quite the same as negation, at least not on modern CPUs.

You don't have sign in number literals at all, you just immediately negate them. This works OK for floats, but if you added integers, you'd be screwed: the (negative) lower bound of a two's-complement integer has a different magnitude to the upper magnitude. For example, the largest signed 32-bit integer is 2147483647 in two's-complement, but the smallest is -2147483648... so you couldn't type the smallest 32-bit integer without a workaround! Of course, this would only matter if Emily's integers were of a fixed size.

The other thing for me is ~3 just looks weird. I like -3.

What I'd prefer would be:

12 is a number literal
-12 is a negative number literal
0 - 12 subtracts twelve from zero
neg 2 negates two

This sacrifices unary minus, so now you can't do -a, which is unfortunate. However, I reckon most of the use of negation is for number literals, and now the sign is part of the literal itself. Also, this might solve some precedence issues, maybe. And of course, it means subtraction, sign and negation are separate.

The other casualty is that 1-2 will now not parse correctly, you'd have to add spaces so it's interpreted as 1 - 2 and not 1 -2. Though maybe there's a workaround.

How do you feel about this idea?

May 30 '15 21:05 hikari-no-yume

Oh yeah, as for scientific notation, I don't think it needs parser support. In maths you don't write 1.5e20, you write 1.5✕10²⁰. It's not a special kind of number notation, it's just a multiplication and exponentiation. So, if Emily were to add an exponentiation operator – perhaps ** since ^ is taken? – you could express it the same way maths does: 1.5 * 10**20.

May 30 '15 21:05 hikari-no-yume

As for expressing whether something is an integer/float/whatever, it's also something I've thought about. I think the best way might be what Haskell – and also Go, albeit to a more limited extent – do: don't give number literals a type straight away. That is, don't make 20 a float or an integer or anything else at the syntactical level, it's just an integral number. Similarly, 20.5 isn't a float or a decimal, it's just a fractional number. Then when it's actually stuck into a variable or expression, give it an actual type you infer from context. If you can't infer one, use a default type (for 20 an integral, for 20.5 float?), and provide a way to specify the type you want explicitly.

An example from Haskell:

$ ghci
Prelude> :t 2
2 :: Num a => a
Prelude> :t 2.5
2.5 :: Fractional a => a
Prelude> :t 2 + 2.5
2 + 2.5 :: Fractional a => a
Prelude> let foobar = (2.5) :: Double
Prelude> :t foobar
foobar :: Double
Prelude> :t foobar + 2
foobar + 2 :: Double

For those unfamiliar with Haskell syntax, Num a => a means "where a is some type of the typeclass Num, the type is a". So, it's decided there that 2 is just something numeric, but it doesn't know what yet.

May 30 '15 21:05 hikari-no-yume

@mcclure most functional languages which have - as unary negation (like Haskell) expect people to put parenthesis around the negation, and in a case like sin -n I would probably do it just for explicitness even if I didn't have to.

Scary stuff on unrecognized symbols is not without precedent, Ruby's method_missing for instance. (Method names with spaces in them anyone?)

@TazeTSchnitzel if we put numbers in an intermediate string representation, then a unaryMinus immediately followed by a numberString could be directly transformed into a negative number by the macro system before we get to the internal representation with those range issues and etc.

For the scientific notation, we do want a shortcut syntax since it's nice and convenient, and also common in other languages. An exponentiation operator would be nice as well though!

May 31 '15 05:05 porglezomp

Also, I just made a little thing that handles the binary, octal, and hex literals, but only without a fractional part for now.

$ ./install/bin/emily -i
Emily language interpreter: Version 0.2, interactive mode
Type "help" for help, "quit" to quit
Type "last" to get the previous line's value

>>> 0xDEADBEEF
3735928559
>>> 0b1 + 0b1
2
>>> 0o777
511

May 31 '15 06:05 porglezomp

Okay well I'm focusing on things other than the macro system for now, but if anyone wanted to implement a unary -, I'll accept it for 0.3. It seems like it will make the number representation thing sensible at least D: And if we do that, then ~s internal to number constants become unnecessary.

My suggestion, make the "-" macro evaluate to future[0].negate if macro past is empty, and (past).minus(future) if macro past is nonempty.

May 31 '15 16:05 mcclure

emily emily copied to clipboard

More complicated numbers?

emily
emily copied to clipboard