hy Dotted symbols should probably be parsed as expressions

Dotted symbols should probably be parsed as expressions

Open Kodiologist opened this issue 3 years ago • 15 comments

Currently, foo.bar parses as (Symbol "foo.bar"), and the compiler is responsible for interpreting the dot. Macros often have to specifically consider dotted identifiers, too. Things would be simpler if foo.bar parsed the same way as (. foo bar). Fully excising dots from identifiers (other than float literals) may require using a new model type for dots, in constructs like (. foo bar) as well as (.foo bar).

Jun 13 '21 13:06 Kodiologist

i think this is worth exploring for 1.0 since it would be a breaking change for macros, but the upside seem pretty nice. Probably should be one of the last things we do though after the less parser based changes are done

Jun 13 '21 14:06 allison-casey

This could be implemented in the reader! At read time we can read identifiers like foo.bar as (. foo bar) and expressions like (.bar foo arg) as ((. foo bar) arg).

Jun 16 '21 05:06 scauligi

so poking around at this i don't think we'll be able to excise dots from the beginning of symbols. i.e. (.bar foo arg) because the conversion changes the order of the arguments in a way i don't think threading macros (or any argument modifying macro) can understand (foo moving in front of bar in this case)

How would the following be parsed?

(-> foo
  .bar)

I don't see the same issue for dots in the middle of symbols, but i can't see a way out of this one.

May 01 '22 17:05 allison-casey

Something like (-> foo .bar), or just plain (do .bar), presumably would be illegal at the reader level, since there's no ~~attribute to get~~ object to get the attribute bar from. A leading period would only be allowed right after (. Reader macros would still be an option if you wanted a macro that could do something else with leading periods in it, I guess. But it might be better to just have the macro treat some other character, like ,, as a sort of escaped period.

Another option would be to choose a built-in default object or attribute for these cases, such as it or X, so (do .bar) would be understood as (do (. it bar)) or (do (. bar X)). I'm not sure if that would help at all.

May 01 '22 17:05 Kodiologist

Alternatively, we could special-case a leading-dot form as a "method getter", kind of like how we special-case bare pyops; e.g., .bar on its own would be equivalent to something like

(fn [this #*args] (.bar this #*args))

May 01 '22 17:05 scauligi

I could see that. .bar could be compiled to operator.attrgetter('bar'), or rather hy.attrgetter('bar') so as not to rely on the value of operator in the surrounding scope.

May 01 '22 17:05 Kodiologist

i don't think hy.attrgetter would work, at least at read time, unfortunately

(-> foo
    .bar)

;; would read as
(-> foo
    (. hy attrgetter "bar"))

;; would macro expand as
(. foo hy attrgetter "bar")

Can we maybe push it to compile time and read dot prefixed symbols as a new FieldAccessor model type, then compile that to hy.attrgetter('bar') like kodi suggested? That would still excise dots from all symbols and make dotted FieldAccessor types distinct from bare symbols (and keeps them separate in our parsers) without changing argument order in weird ways. So this would fail our parser automatically

=> (defmacro .bar [])
Traceback (most recent call last):
  File "stdin-a56869b22e69471f3fef499237e9985b572215de", line 1, in <module>
    (defmacro .bar [])
  File "<stdin>", line 1
    (defmacro .bar [])
              ^
hy.errors.HySyntaxError: parse error for pattern macro 'defmacro': got unexpected token: hy.models.FieldAccessor("bar"), expected: (some) or (some)

There's also this case:

(.bar.baz foo)
;; would have to expand to (using current dot expansion)
(. (. foo bar) (baz))
;; again unclear how this works in threading macros

;; reading into a `FieldAccessor` type and expanding to
(((. hy attrgetter) "bar.baz") foo) 
;; *after* macro expansion at compile time does seem promising 

;; so the previous threading case would be
(-> foo .bar)
;; read time
(Expression [(Symbol "->") (Symbol "foo") (FieldAccessor "bar")])
;; macro expansion time
(.bar foo)
;; compile time
(((. hy attrgetter) "bar") foo)

May 03 '22 17:05 allison-casey

Well, like I said earlier, we do probably need a model type for dots, however exactly we approach this, and FieldAccessor could be that model type. Perhaps its argument, instead of a string, should be a list of Symbols, so e.g. foo.bar.baz would read as (FieldAccessor [(Symbol "foo") (Symbol "bar") (Symbol "baz")]). Then code that recursively replaces symbols will indeed still see symbols. The first symbol could be "None" if it's a leading-dot form like None, or maybe FieldAccessor should have two arguments, one an optional object to call and one a list of attributes.

May 03 '22 17:05 Kodiologist

I like that idea. I think we can just leave FieldAccessor to contain the symbols to access since attrgetter is a hof that returns a function and we don't need any knowledge of the accessing object for that. we can have mid symbol dots expand at read time like we were hoping to too. So foo.bar expands at read time to (. foo bar) and .bar.baz reads as (FieldAccessor [(Symbol "bar") (Symbol "baz")]) and can compile directly to a ast.Call or something

Call(
    func=Attribute(value=Name(id="hy", ctx=Load()), attr="attrgetter", ctx=Load()),
    args=[Constant(value="bar.baz")],
    keywords=[],
)

I'll give this a go and see if it works

May 03 '22 18:05 allison-casey

so having gotten .foo.bar and foo.bar parsing as (Interop ['foo 'bar]) and (. foo bar) respectively, there are some tradeoffs i've noticed. notably import and require parsing of relative imports like (require ..resources.macros [test-macro]) becomes more difficult. So instead of testing if a symbol contains . you now need to test if it's an Interop type of a certain shape which can be tricky. I'll keep looking into it and see if things can be simplified, but this might end up introducing more complexity than it's removing

May 15 '22 22:05 allison-casey

If we need to change the syntax for relative imports (and requires), I don't think that's a big deal.

May 15 '22 23:05 Kodiologist

having played around with this for a while, I'm gonna say i don't think this is a good idea anymore. It adds a whole host of extra complexity to basic tasks around things that are expected to be symbols but can contain dots. Parsing into (. .....) interop forms turns a symbol into an expression which is much harder to validate and the same goes for parsing into some interop form. It turns one (in "." sym) into 1-2 more checks + additional parsing logic that's much more complicated than an in. if someone else has a better route forward that doesn't add as much complexity I'm all ears.

May 30 '22 17:05 allison-casey

Hmm. Tricky. Well, does the added complexity only happen in Hy's internals, like the standard result macros, or does it seem to cause trouble for user-defined macros, too? Because I think some more internal complexity is a reasonable price to pay in order for user-written macros to be easier and less error-prone.

May 30 '22 17:05 Kodiologist

it's anywhere you can take a sym that might have dots. Which happens a lot when writing macros in Hy given how prevalent dots are in Python unfortunately.

May 30 '22 20:05 allison-casey

What I would expect is that getting foo.bar as an expression would be helpful for a macro, because a macro that's expecting a symbol specifically probably wouldn't be able to cope with a dot; essentially, we save the macro from having to do this check itself. In places where an arbitrary form is expected, on the other hand, an expression should be fine.

I guess I'd like to at least take a look before we give up. Have you pushed your work to a branch in your fork?

May 30 '22 20:05 Kodiologist

Is this why I didn't find a section in the manual about infix . as syntax sugar?

(I don't really remember seeing an explanation at all, though I didn't quite read cover-to-cover.)

Oct 04 '22 19:10 SamB

Yes, it is. foo.bar is mentioned in the tutorial and used elsewhere, but I haven't yet put a real explanation of it in syntax.rst because I expect it to change.

Oct 04 '22 19:10 Kodiologist

My current thinking is to use my original idea of transforming dotted syntax to calls to the . macro. When there's no object to take an attribute from, as in (print .foo), the object will implicitly be None. This is okay for importing (as in (import .parent-module [item])) because Python already syntactically forbids importing a module named None; the fact that (import None) is currently legal Hy is arguably a bug. If the form begins with more than one dot, then we instead create an expression where the head has more than one dot. More than one dot between two names is a parse error.

Syntax	New equivalent
`obj.a1`	`(. obj a1)`
`obj.a1.a2.a3`	`(. obj a1 a2 a3)`
`(.a1 obj)`	`((. obj a1))`
`(.a1.a2 obj x y)`	`((. obj a1 a2) x y)`
`(..a1 obj)`	`(.. obj a1)`
`(..a1.a2 obj x y)`	`((.. obj a1 a2) x y)`
`(..a1..a2 obj x y)`	illegal
`.a1`	`(. None a1)`
`obj.`	illegal
`..a1.a2`	`(.. None a1 a2)`
`a..b`	illegal
`..a..b`	illegal

My plan is not to define any objects or macros named .., ..., etc. So, they'll be open to users who find uses for them.

Dec 30 '22 19:12 Kodiologist

hy hy copied to clipboard

Dotted symbols should probably be parsed as expressions

hy
hy copied to clipboard