hissp icon indicating copy to clipboard operation
hissp copied to clipboard

Parsing quirks

Open gilch opened this issue 4 years ago • 6 comments

\\. should be an alias of QzBSOL_., i.e. a module literal. Instead it's read as QzFULLxSTOP_. Not sure how that happened, but full stops are handled separately, and this seems wrong.

It is currently possible for a reader macro to start with : e.g. (defmacro :QzHASH_ (..., which doesn't seem like the ideal way to say it. This ends up adding a non-identifier string to the _macro_ namespace, which prevents attribute-access symbols like _macro_.:QzHASH_ from working. I'm not sure what \:\# should be. Making it :QzHASH_ would work for the defmacro, but it doesn't make a lot of sense. Maybe the \: should suppress the control word interpretation, making it QzCOLON_QzHASH_? But then the reader macro usage would have to be spelled \:#foo, which isn't as nice. Should control words be disallowed as reader macro names? Should they always be treated like the first character is escaped? Maybe just colons? I will have to think about this some more.

Reader macros with extras could be more compact if symbols weren't allowed to contain !. Then you could say foo#!1!2!3 bar instead of foo# !1 !2 !3 bar or foo#!!! 1 2 3 bar. You could still escape them, like any other character. But there are conventions where mutating/dangerous functions end in a !, and now they'd have to end in a \!, which isn't as nice. Maybe there's some way to avoid that. Extra could maybe use some other character instead of !. Lisp's symbols can contain more characters than Python, but that also means you need spaces to separate things in situations Python wouldn't. I'm not really satisfied with the whole Extra system, but I don't have a better idea yet.

gilch avatar Feb 12 '22 22:02 gilch

If we get rid of collection atoms, #130, that frees up the []{} characters for other things. We could write extras in square brackets, like foo#[1 2 3]bar, foo[1 2 3]#bar, or even foo[1 2 3]bar, in which case, non-extra macros might be written foo[]bar? I'm not sure how easy this is to parse yet, or how it would interact with tooling like Emacs or Parinfer.

A comment string now turns out like this:

(exec
  <<#[
  ;for i in 'abc':
  ;    for j in 'xyz':
  ;        print(i+j, end=" ")
  ;print('.')
  ;
  ]#"\n")

The single leading ; is bad though. Emacs would indent them to the margin. (Which would still work though.) You could avoid that with ;;. The macro could easily strip the first character of each line:

(exec
  <<#[
  ;;for i in 'abc':
  ;;    for j in 'xyz':
  ;;        print(i+j, end=" ")
  ;;print('.')
  ;;
  ]#"\n")

But now Parinfer wouldn't like the brackets here and would have to indent them at least this much:

(exec
  <<#[
      ;;for i in 'abc':
      ;;    for j in 'xyz':
      ;;        print(i+j, end=" ")
      ;;print('.')
      ;;
      #_/]#"\n")

It wouldn't care about the comments' indentation, but if we want them aligned with the other elements, that's where they go. Parinfer also wouldn't allow a closing bracket to start a line like this, so it needs the final discarded item. Square brackets are maybe nice for inline extras, but this seems worse if they span multiple lines.

gilch avatar Feb 12 '22 22:02 gilch

# is now the macro for sets. But _macro_.# is still read as a reader macro. _macro_.\# works, but probably reader macro names should not be allowed to end in a dot.

gilch avatar Feb 16 '22 06:02 gilch

## should probably be a single-character reader macro. It's currently a symbol. \## works though.

gilch avatar Mar 13 '22 05:03 gilch

. is a module handle. For the empty-named module. Which is weird. That would totally be a syntax error in a Python import statement. Empty is not a valid identifier, and can't be munged to one. This should probably just be a symbol: QzFULLxSTOP_.

gilch avatar Apr 25 '22 00:04 gilch

.. is a SyntaxError. ... is Ellipsis. Four or more is likewise a syntax error. This is related to . being the empty-named module. Maybe these errors should be symbols too?

gilch avatar Apr 25 '22 04:04 gilch

[# is now a macro for the subscript operation, particularly good for slices. So that's kind of claimed again.

gilch avatar Sep 07 '22 02:09 gilch

The EDN Hissps have claimed . to represent :, since that's not allowed in EDN, and Hissp needs it.

The X# series is often combined with inject to turn a Python expression into a function, e.g. XY#.#"Y-X". It would be slightly nicer to do XY.# instead. This saves all of one character in Lissp, which is why I didn't bother, but in the EDN Hissps, where inject is currently has a longer spelling (#XY #hissp/."Y-X"), this is a much bigger win: #XY."Y-X".

The dot is the natural choice here (though not the only one), but it would have to munge for the attr in macro to be a valid identifier. It can currently be defined using X\.\#, but the tag still has to be spelled X\.#, negating the 1-character saving, and probably requiring the full spelling of the munged name in EDN Hissps. The munger needs to leave dots alone for attr access and module handles, so tags with a trailing (or leading?) dot would have to be special cased in the respective readers.

gilch avatar Apr 12 '23 04:04 gilch

I'm noticing that the desired interpretation of XY.# as a XYQzFULLxSTOP_ tag and _macro_.# as a _macro_.QzHASH_ attr are incompatible. Of course, XY\.# and _macro_.\# would disambiguate, but one or the other shouldn't have to.

Tags have no particular need for a leading dot either, so .XY# should maybe be allowed (as a QzFULLxSTOP_XY tag), freeing up the trailing dot for the attr interpretation. Are dots even allowed in EDN tags though? If so, where? I was paying attention to those details when writing garden-of-edn, but I need to check that again.

gilch avatar Jun 16 '23 04:06 gilch

Looks like EDN tags can't start with a dot (must be alphabetic after the #), but the part after the prefix/ (the name) could, as long as the next character is not a digit.

gilch avatar Jun 17 '23 03:06 gilch

Found another one:

#> (.hex : self 42.0)
>>> self=(42.0).hex()

This happens to compile to a valid assignment, which is interesting. Only at the top level though. It's a little too situational to be very useful and it's nonsense when nested. Not worth keeping. But is it worth fixing? The fix would probably be to make it an error, but it's already that (except at the top level, where it's not, but not useful either).

gilch avatar Jun 25 '23 05:06 gilch

Actually, it does work nested. Sometimes.

>>> print(
...   end=(42.0).hex())
0x1.5000000000000p+5#>

Still seems too situational to be useful. I still kind of think this should be an error, and seeing how it's not just a top-level problem makes me more inclined to fix it.

gilch avatar Jun 25 '23 05:06 gilch