DelphiAST icon indicating copy to clipboard operation
DelphiAST copied to clipboard

Better dialect management

Open quartexNOR opened this issue 10 years ago • 7 comments

Introduce parser/lexer compiler-switches for finer control of language features. First and foremost language features which counteracts platform and compiler independence.

Being able to easily adjust the parser to support "Delphi 7" dialect, or "Delphi XE" dialect -- and also turn off pointer support, ASM support and so on - broadens the uses of the AST profoundly.

-Pointers enable/disable -ASM section enable/disable -external library references enable/disable -symbol export (export keyword) enable/disable -WinAPI messages (message keyword and mapping) -Generics enable/disable -Records enable/disable

Also Introduce switches for allowed value-types:

-Integer -int64 -Boolean -Currency -ShortString -Variant

  • .. all intrinsic datatypes

The purpose of such customization is to make it easier to use the AST as a drop in module for existing code-generators. Smart Mobile Studio is presently using (for instance) DWScript as it's primary parser/lexer/tokenizer - and then generates javascript from the AST produced by DWS. This however has the downside of: any changes to the codegen affects the existing DWS dialect, and any changes to DWScript affects Smart Pascal.

DelphiAST could in many ways replace the use of DWS, allowing for a much richer translation using XML transformation. A draft for this was actually written by me (Jon Lennart Aasenden) for the quartex pascal IDE (quartexpascal.wordpress.com).

Since DelphiAST is more or less 90% compatible with my own architecture for a portable, source to source based transformation format, it makes more sense to improve DelphiAST than writing yet another module.

If DelphiAST could introduce finer control of the dialect, where we could toggle support for various language features (pointers and so on), it would be more than capable of becoming the de-facto AST generator for my projects.

As of writing: JavaScript, C# and C++ are my primary planned targets. Although JavaScript is already covered by Smart Mobile Studio, and it's so efficient that no new compilers are required. C# on the other hand is very interesting, since it would give us access to mono and it's frameworks.

quartexNOR avatar Jan 19 '15 07:01 quartexNOR

As far as I understood, you want to manually make parser fail on unsupported language elements for a selected dialect, right?

It is not difficult to implement. I think, the easiest way to do that is to inherit TPasSyntaxTreeBuilder class.

For example:

TPasSyntaxTreeBuilderD7 = class(TPasSyntaxTreeBuilder)
protected
  procedure TypeArgs; override;
  // etc..
end;

procedure TPasSyntaxTreeBuilderD7.TypeArgs;
begin
  raise ENotSupported.Create('Not supported in D7: TypeArgs');
end;

// etc...

So it's doable. Is this what you need?

RomanYankovsky avatar Jan 19 '15 10:01 RomanYankovsky

This is a good idea, especially given that we have at our disposal a list of all language features, when they were added, when they were improved, and anything that was removed.

To have these as an enumerated set would be very handy, then to have a set of constants defined for specific versions would make it trivial to for an implementing developer to use.

It would also enable IDE extensions to be able to warn third-party tools/components/library developers when they're using features that are only available within a particular range of Delphi versions (can't tell you how much time that would've saved me over the last 5+ years).

LaKraven avatar Jan 19 '15 14:01 LaKraven

Yes that was what I had in mind.

For instance, let's say I want to use DAST to generate JS compatible code, where a code-generator is written to handle the node-tree directly (or through an XML import).

I would then disable pointer-support, library-support, message support etc., forcing the user to write plain, vanilla "safe" object pascal. DelphiAST would then function as a parser and method of making sure the syntax is correct. The codegen's job would be to reflect everything in the target language as best it can.

If anyone tries to use those features while these options are enabled, we could either throw an exception or (perhaps) signal an event.

Something like: Raise EDASTDialectError('Pointers are not allowed for the present dialect');

Like LAKraven says, this makes it possible to have "default" settings which mimics the evolution of Delphi, from the older Delphi-7 dialect through the more modern phases up to XE7.

Hopefully we could also add support for some of the Smart Pascal dialect features, like lambdas, shortcut properties, ++ / -- operators and array helpers. That way we would have the proverbial sauron's ring in terms of AST support.

SMS has a few interesting dialect changes, for instance:

Type TMyObject = Class(TObject) published Property Name:String; Property Value:variant; end;

And shorthand operators like:

FValue ++; // Inc by 1 FValue --; // dec by 1 FValue +=12; //add 12

Also, all arrays have more or less all the methods you would expect from a TList.

var mBuffer: Array of Integer; Begin mBuffer.add(12); mbuffer.insert(0,13); mBuffer.delete(0,1); mBuffer.sort; mBuffer.clear; end;

I would add this myself, but i have to spend some time with pasParse before i jump in. ++/--/+= operators requires two reads as opposed to the standard single char operator, do we have a "peek next token" somewhere?

e.g: previous token was a variable current token is "+" next token is "+" next next token is ";"

At which point we know it's an inc operation. I had a round with castalia a couple of years back, but have focused completely on DWScript - since that's where 90% of my free time goes these days.

quartexNOR avatar Jan 19 '15 17:01 quartexNOR

If we are going to support alternative dialects (like FreePascal, or SMS, or DWScript), we have to consider some form of class inheritance or dependency injection. Supporting all dialects in a single class will make it overcomplicated...

I would add this myself, but i have to spend some time with pasParse before i jump in. ++/--/+= operators requires two reads as opposed to the standard single char operator, do we have a "peek next token" somewhere?

You can take a look at >=, <=, <> operators implementation. They require two reads too. Just do a search for "ptGreaterEqual" substring. But adding new operators is not as easy as it could be: you have to add a support for a new operator to lexer, then to parser, and finally to AST builder. To all three layers of logic.

RomanYankovsky avatar Jan 20 '15 10:01 RomanYankovsky

var
  mBuffer: Array of Integer;
Begin
  mBuffer.add(12);
  mbuffer.insert(0,13);
  mBuffer.delete(0,1);
  mBuffer.sort;
  mBuffer.clear;
end;

This will be parsed without any problems :) Parser doesn't do type check.

RomanYankovsky avatar Jan 20 '15 10:01 RomanYankovsky

Those methods are "magic" methods, meaning they should be known to the parser, as they are not defined anywhere. Does castalia do symbol checks? Meaning - checking that a class actually have a member or a record/datatype have members?

quartexNOR avatar Jan 21 '15 09:01 quartexNOR

Syntactically, the code above is absolutely correct. Parser checks syntax only, nothing else. @LaKraven was going to start developing a symbol table builder on the top of existing AST. This will make such checks possible.

RomanYankovsky avatar Jan 21 '15 13:01 RomanYankovsky