M2 icon indicating copy to clipboard operation
M2 copied to clipboard

User-defined keywords

Open pzinn opened this issue 1 year ago • 35 comments

Work in progress to implement user-defined keywords: (cf #3593)

i1 : new Keyword from "×"

o1 = ×

o1 : Keyword

i2 : ZZ × ZZ := times

o2 = times

o2 : CompiledFunction

i3 : 1×2

o3 = 2

The most important change is to have a default operation (which applies to 99% of existing keywords, and to the newly created user-defined keywords) which is to look up the corresponding method. so no need to define all these identical functions one-by-one.

TODO:

  • ~~treat all the different cases (unary, binary, precedence) by adding options to the m2 level new Keyword from~~
  • ~~simplify drastically actors5.d by removing all unnecessary functions~~
  • transfer the creation of keywords to m2 level (core)
  • get rid of opsWithBinaryMethod etc
  • get rid of getBinopName etc and rewrite pseudocode accordingly
  • ~~simplify the tex routines~~
  • possibly remove entirely the fictitious Keyword class, instead providing some information in the Symbol Afterprint; or rename it to something more meaningful
  • ~~fix the other minor issues mentioned in #3593~~
  • document new Keyword

pzinn avatar Nov 28 '24 08:11 pzinn

I might stop here, to keep this PR relatively small and to avoid the more controversial items on my list.

pzinn avatar Dec 01 '24 10:12 pzinn

Since lots of things are being refactored here, what do you think about also changing the terminology? My understanding is that the classes Symbol and Keyword are identical, except that keywords are symbols which are reserved by the language and can't be assigned values, etc. This means things like and, then, but also operators like +, etc. But with this PR users can define new keywords, which implies they are no longer reserved.

So I think we should create a new class Operator for only non-alphanumeric math operators which are distinct from reserved alphanumeric keywords.

mahrud avatar Dec 03 '24 04:12 mahrud

sure.

pzinn avatar Dec 03 '24 04:12 pzinn

though it would be nice to have a common ancestor, since I don't want to have two different methods makeKeyword and makeOperator which would do the exact same thing.

pzinn avatar Dec 03 '24 04:12 pzinn

I guess what I'm saying is we should not have makeKeyword at all, since by definition keywords are the preserved symbols.

edit: to clarify, a makeKeyword function to simplify declaring alphanumeric only reserved symbols (e.g. and, or etc.) in the interpreter is totally fine, I'm saying there shouldn't be a top level one. On the other hand, a makeOperator function to simplify things in the interpreter and also exporting it to allow users to define new operators is fine.

mahrud avatar Dec 03 '24 04:12 mahrud

I guess what I'm saying is we should not have makeKeyword at all, since by definition keywords are the preserved symbols.

edit: to clarify, a makeKeyword function to simplify declaring alphanumeric only reserved symbols (e.g. and, or etc.) in the interpreter is totally fine, I'm saying there shouldn't be a top level one. On the other hand, a makeOperator function to simplify things in the interpreter and also exporting it to allow users to define new operators is fine.

err? that's exactly what this PR is doing... Example

makeKeyword "implies"
Boolean implies Boolean := (a,b) -> not a or b
false implies true

pzinn avatar Dec 03 '24 07:12 pzinn

to recap: two types of Keyword exist:

  1. the ones that look like normal (alphanumeric) symbols, like and
  2. the ones made of special characters, like **. These are the ones that require install for special parsing.

both are (and should be) definable using makeKeyword. The only confusing bit is that mathematical operators like × are classified as normal symbols (so in 1. above). This is what I was complaining about before, and should probably be reverted one day, though not in this PR since it would require a significant rewrite (not to mention the backward compatibility problem).

pzinn avatar Dec 03 '24 07:12 pzinn

oh yeah this PR also fixes this silly bug: getGlobalSymbol ""

pzinn avatar Dec 03 '24 07:12 pzinn

If a user can define implies as in your example, then it is by definition not a reserved symbol, therefore it shouldn't be called a keyword. It's just a new type of method attached to a symbol, which is very useful, but is different.

mahrud avatar Dec 03 '24 08:12 mahrud

sure, I'm happy to name it something else. but the point is, for all practical purposes, in that example, and and implies behave in the exact same way, and therefore should probably have the same class.

pzinn avatar Dec 03 '24 08:12 pzinn

for all practical purposes, in that example, and and implies behave in the exact same way, and therefore should probably have the same class.

Then users should not be able to define them in the top level. They should be defined in the interpreter, compiled, and fixed in the language.

mahrud avatar Dec 03 '24 08:12 mahrud

for all practical purposes, in that example, and and implies behave in the exact same way, and therefore should probably have the same class.

Then users should not be able to define them in the top level. They should be defined in the interpreter, compiled, and fixed in the language.

well, I disagree. It certainly doesn't harm to give users that power.

pzinn avatar Dec 03 '24 08:12 pzinn

It certainly does harm if it's hastily implemented without much longer discussion and evaluation. What happens if a package makes foo into a keyword and another makes it a symbol for a method or variable? What happens if there's a race condition in which one is defined first?

Again, I'm all in favor of allowing users to define non-alphanumeric math operators in the top level, because those can't be used as symbols for methods or variables, and we already know how to deal with different packages overriding the same method, but keywords are by definition preserved words which can't be altered by the user.

mahrud avatar Dec 03 '24 08:12 mahrud

actually, it makes no difference whether the keyword is alphanumeric or not. there will be a conflict if two packages define the same keyword. Right now it's a non issue because no package does so, but eventually we need a mechanism to resolve if say two different packages want to use × or ~~~ for different purposes. (The biggest problem I can see is, we can't reconcile them if they have different Syntax or Precedence) edited to give an example that's actually non-alphanumeric (sigh)

pzinn avatar Dec 03 '24 08:12 pzinn

To clarify, Symbols can be in different dictionaries, but the underlying Words, which carry the parsing information, cannot (and shouldn't precisely because they affect parsing).

pzinn avatar Dec 03 '24 08:12 pzinn

Let me try to summarise some of the changes that have been requested during the M2internals meeting:

  • not let users define alphanumeric keywords. though I don't fully agree with this, allowing it does open a can of worms (as the hi example showed) and maybe for now it's simpler to prevent this.
  • if we do that, that definitely forces another change that was not mentioned: currently math symbols are considered as alphanumeric. This is a hack I introduced a while ago due to the backwards compatibility issue mentioned above in the discussion. This will have to go, otherwise we won't be able to define any math symbol as a keyword.
  • have more meaningful type names. Keyword is not really an appropriate name for symbols like . One could create a new type called Operator (which, just like Keyword, would be a fake type anyway... they're all Symbols internally).
  • Even for these operators, there'd still be an issue if two different packages try to define the same say cow symbol. There needs to be a way of dealing with this, and this remains the most problematic issue. One could make the symbols local (right now all Keyword are really global symbols), but that wouldn't really the solve the parsing issue -- if for some reason one package wants to use a symbol as unary and the other as binary, it will be hard to reconcile.
  • Only have makeKeyword in core, not packages: this one I strongly disagree with, and will not implement. Users are not babies. If they want they can already easily break M2 in all kinds of ways (but why would they?). Keywords (or operators, or whatever they will be called) should be definable in packages.

pzinn avatar Dec 16 '24 22:12 pzinn

Limiting makeKeyword to the interpreter or the Core was mainly a solution to the problematic issue that you mentioned. Ultimately this PR introduces some useful features which can be added now, but jumping directly from idea to production for major changes is a recipe for ending up with compatibility nightmares and packages that only work with a specific version of M2 or indecipherable issues that take valuable time to resolve.

Perhaps a good place to start is a wiki or discussion page where you lay out a proposal (similar to Python Enhancement Proposals) with concise technical specifications of how you believe keywords/operators/etc should be handled, from parsing and binding in the interpreter all the way to the packages. This can simultaneously serve as a documentation of your contribution to this part of the interpreter. This can be discussed and amended, until there's agreement.

mahrud avatar Dec 17 '24 01:12 mahrud

I think I've dealt with all the issues that were raised: in particular, now alphanumeric keywords are no longer allowed. One pleasant side-effect is the removal of the "math operator hack" which I'd introduced a while ago. It forced me to slightly modify MultiprojectiveVarieties so that their symbols "⋂","⋃","∏" are now keywords. The only thing left in the list is to possibly rename Keyword, but I'm leaving this one for later, since this is purely cosmetic.

pzinn avatar May 02 '25 13:05 pzinn

the checks were successful before, I don't think the ThreadedGB stuff has anything to do with this PR.

pzinn avatar May 09 '25 20:05 pzinn

I agree, but I restarted that job anyway. My preference would be to merge this PR right after the release to give some time for daily testing.

mahrud avatar May 09 '25 21:05 mahrud

Okay. Note that this PR contradicts #3737, and conflicts (not fatally, just some work will be needed) with #3748.

pzinn avatar May 11 '25 15:05 pzinn

@pzinn @d-torrance once the conflict is resolves, I think this can be merged?

mahrud avatar Jul 11 '25 20:07 mahrud

@pzinn and @mahrud , what is the status of this PR? Are you both happy with it now, or do we need to look at it closer?

mikestillman avatar Jul 15 '25 15:07 mikestillman

I'm waiting for @pzinn to resolve the conflcts before merging it.

mahrud avatar Jul 15 '25 16:07 mahrud

@pzinn do you want someone else to fix the conflict in lex.d?

mahrud avatar Jul 18 '25 21:07 mahrud

I'm having a little trouble using this, wondering what I'm doing wrong:

i1 : needsPackage "A"

o1 = A

o1 : Package

i2 : 2 ⊙  2
stdio:2:2:(3): error: undefined symbol ⊙

i7 : get A#"source file"

o7 = newPackage "A"

     new Keyword from "⊙"
     ZZ ⊙ ZZ := times

But if I paste the same directly in M2 it works.

mahrud avatar Jul 21 '25 05:07 mahrud

Interesting, this also fails:

newPackage "A"

export { "⊙" }

new Keyword from "⊙"

ZZ ⊙ ZZ := times

With the error:

i1 : needsPackage "A"
../../Macaulay2/packages/A.m2:3:6:(2):[11]: error: invalid symbol

But this one works:

newPackage "B"

new Keyword from "⊙"

export { "⊙" }

ZZ ⊙ ZZ := times

mahrud avatar Jul 21 '25 05:07 mahrud

I don't understand any of the comments. seems to be working as intended. you always need to define a keyword before using it.

I think this is better than individual packages exporting these symbols, because for instance I would like to use ⊠ in my own package, but I don't want to depend on the CpMackeyFunctors package for it, and I also don't want the two packages not to work together.

You obviously don't need to load CpMackeyFunctors to use ⊠, you just define it in your own package, that's the whole point. Just like for ordinary symbols, you will get an annoying warning message when both are loaded. There's not much I can do about that, it's how Macaulay2 works.

pzinn avatar Jul 21 '25 21:07 pzinn

I was about to merge this, but I realized that all unicode keywords that used to be exported by Core and overloaded by packages are now broken. I suggest the following:

what exactly is broken? except the minor export oversight which I can fix.

pzinn avatar Jul 21 '25 22:07 pzinn

The only reasonable alternative I can think of is that mathematical symbols are "auto-defined", in the same way as ordinary symbols: the first time they appear, a keyword is created. The only issue with this is that it means they can no longer be used as ordinary symbols, e.g., ⊠ can only be a Keyword, not a Symbol.

pzinn avatar Jul 21 '25 22:07 pzinn