hy
hy copied to clipboard
Dotted symbols should probably be parsed as expressions
Currently, foo.bar
parses as (Symbol "foo.bar")
, and the compiler is responsible for interpreting the dot. Macros often have to specifically consider dotted identifiers, too. Things would be simpler if foo.bar
parsed the same way as (. foo bar)
. Fully excising dots from identifiers (other than float literals) may require using a new model type for dots, in constructs like (. foo bar)
as well as (.foo bar)
.
i think this is worth exploring for 1.0 since it would be a breaking change for macros, but the upside seem pretty nice. Probably should be one of the last things we do though after the less parser based changes are done
This could be implemented in the reader! At read time we can read identifiers like foo.bar
as (. foo bar)
and expressions like (.bar foo arg)
as ((. foo bar) arg)
.
so poking around at this i don't think we'll be able to excise dots from the beginning of symbols. i.e. (.bar foo arg)
because the conversion changes the order of the arguments in a way i don't think threading macros (or any argument modifying macro) can understand (foo
moving in front of bar
in this case)
How would the following be parsed?
(-> foo
.bar)
I don't see the same issue for dots in the middle of symbols, but i can't see a way out of this one.
Something like (-> foo .bar)
, or just plain (do .bar)
, presumably would be illegal at the reader level, since there's no ~~attribute to get~~ object to get the attribute bar
from. A leading period would only be allowed right after (
. Reader macros would still be an option if you wanted a macro that could do something else with leading periods in it, I guess. But it might be better to just have the macro treat some other character, like ,
, as a sort of escaped period.
Another option would be to choose a built-in default object or attribute for these cases, such as it
or X
, so (do .bar)
would be understood as (do (. it bar))
or (do (. bar X))
. I'm not sure if that would help at all.
Alternatively, we could special-case a leading-dot form as a "method getter", kind of like how we special-case bare pyops; e.g., .bar
on its own would be equivalent to something like
(fn [this #*args] (.bar this #*args))
I could see that. .bar
could be compiled to operator.attrgetter('bar')
, or rather hy.attrgetter('bar')
so as not to rely on the value of operator
in the surrounding scope.
i don't think hy.attrgetter
would work, at least at read time, unfortunately
(-> foo
.bar)
;; would read as
(-> foo
(. hy attrgetter "bar"))
;; would macro expand as
(. foo hy attrgetter "bar")
Can we maybe push it to compile time and read dot prefixed symbols as a new FieldAccessor
model type, then compile that to hy.attrgetter('bar')
like kodi suggested? That would still excise dots from all symbols and make dotted FieldAccessor
types distinct from bare symbols (and keeps them separate in our parsers) without changing argument order in weird ways. So this would fail our parser automatically
=> (defmacro .bar [])
Traceback (most recent call last):
File "stdin-a56869b22e69471f3fef499237e9985b572215de", line 1, in <module>
(defmacro .bar [])
File "<stdin>", line 1
(defmacro .bar [])
^
hy.errors.HySyntaxError: parse error for pattern macro 'defmacro': got unexpected token: hy.models.FieldAccessor("bar"), expected: (some) or (some)
There's also this case:
(.bar.baz foo)
;; would have to expand to (using current dot expansion)
(. (. foo bar) (baz))
;; again unclear how this works in threading macros
;; reading into a `FieldAccessor` type and expanding to
(((. hy attrgetter) "bar.baz") foo)
;; *after* macro expansion at compile time does seem promising
;; so the previous threading case would be
(-> foo .bar)
;; read time
(Expression [(Symbol "->") (Symbol "foo") (FieldAccessor "bar")])
;; macro expansion time
(.bar foo)
;; compile time
(((. hy attrgetter) "bar") foo)
Well, like I said earlier, we do probably need a model type for dots, however exactly we approach this, and FieldAccessor
could be that model type. Perhaps its argument, instead of a string, should be a list of Symbol
s, so e.g. foo.bar.baz
would read as (FieldAccessor [(Symbol "foo") (Symbol "bar") (Symbol "baz")])
. Then code that recursively replaces symbols will indeed still see symbols. The first symbol could be "None
" if it's a leading-dot form like None
, or maybe FieldAccessor
should have two arguments, one an optional object to call and one a list of attributes.
I like that idea. I think we can just leave FieldAccessor
to contain the symbols to access since attrgetter
is a hof that returns a function and we don't need any knowledge of the accessing object for that. we can have mid symbol dots expand at read time like we were hoping to too. So foo.bar
expands at read time to (. foo bar)
and .bar.baz
reads as (FieldAccessor [(Symbol "bar") (Symbol "baz")])
and can compile directly to a ast.Call
or something
Call(
func=Attribute(value=Name(id="hy", ctx=Load()), attr="attrgetter", ctx=Load()),
args=[Constant(value="bar.baz")],
keywords=[],
)
I'll give this a go and see if it works
so having gotten .foo.bar
and foo.bar
parsing as (Interop ['foo 'bar])
and (. foo bar)
respectively, there are some tradeoffs i've noticed. notably import
and require
parsing of relative imports like (require ..resources.macros [test-macro])
becomes more difficult. So instead of testing if a symbol contains .
you now need to test if it's an Interop
type of a certain shape which can be tricky. I'll keep looking into it and see if things can be simplified, but this might end up introducing more complexity than it's removing
If we need to change the syntax for relative imports (and requires), I don't think that's a big deal.
having played around with this for a while, I'm gonna say i don't think this is a good idea anymore. It adds a whole host of extra complexity to basic tasks around things that are expected to be symbols but can contain dots. Parsing into (. .....)
interop forms turns a symbol into an expression which is much harder to validate and the same goes for parsing into some interop form. It turns one (in "." sym)
into 1-2 more checks + additional parsing logic that's much more complicated than an in
. if someone else has a better route forward that doesn't add as much complexity I'm all ears.
Hmm. Tricky. Well, does the added complexity only happen in Hy's internals, like the standard result macros, or does it seem to cause trouble for user-defined macros, too? Because I think some more internal complexity is a reasonable price to pay in order for user-written macros to be easier and less error-prone.
it's anywhere you can take a sym that might have dots. Which happens a lot when writing macros in Hy given how prevalent dots are in Python unfortunately.
What I would expect is that getting foo.bar
as an expression would be helpful for a macro, because a macro that's expecting a symbol specifically probably wouldn't be able to cope with a dot; essentially, we save the macro from having to do this check itself. In places where an arbitrary form is expected, on the other hand, an expression should be fine.
I guess I'd like to at least take a look before we give up. Have you pushed your work to a branch in your fork?
Is this why I didn't find a section in the manual about infix .
as syntax sugar?
(I don't really remember seeing an explanation at all, though I didn't quite read cover-to-cover.)
Yes, it is. foo.bar
is mentioned in the tutorial and used elsewhere, but I haven't yet put a real explanation of it in syntax.rst
because I expect it to change.
My current thinking is to use my original idea of transforming dotted syntax to calls to the .
macro. When there's no object to take an attribute from, as in (print .foo)
, the object will implicitly be None
. This is okay for importing (as in (import .parent-module [item])
) because Python already syntactically forbids importing a module named None
; the fact that (import None)
is currently legal Hy is arguably a bug. If the form begins with more than one dot, then we instead create an expression where the head has more than one dot. More than one dot between two names is a parse error.
Syntax | New equivalent |
---|---|
obj.a1 |
(. obj a1) |
obj.a1.a2.a3 |
(. obj a1 a2 a3) |
(.a1 obj) |
((. obj a1)) |
(.a1.a2 obj x y) |
((. obj a1 a2) x y) |
(..a1 obj) |
(.. obj a1) |
(..a1.a2 obj x y) |
((.. obj a1 a2) x y) |
(..a1..a2 obj x y) |
illegal |
.a1 |
(. None a1) |
obj. |
illegal |
..a1.a2 |
(.. None a1 a2) |
a..b |
illegal |
..a..b |
illegal |
My plan is not to define any objects or macros named ..
, ...
, etc. So, they'll be open to users who find uses for them.