semantic-php Support braceless namespaces

Hi Joris,

I thought I could give a try to the first milestone in the README so I'm coming with a draft for supporting braceless namespaces so that the following tags are considered to be members of it.

It is implemented in the commit stevenremot@04e98a0 of my fork, in the branch braceless-namespaces.

As the grammar adds the attribute :braceless to this kind of namespaces, my idea is to post-process toplevel parsing to manually integrate the following tags in the namespace. This keeps the grammars simple, but I'm not totally sure there is a proper hook / mode overridable function to do that. For now I have overridden sematic-parse-region and process the tags when there is a full reparse, but this does not seem to work in all cases.

Previously, in my CEDET fork, I handled this case in the grammar, by gathering all tags after a braceless namespace in a kind of "container" tag before putting them in the namespace:

compilation_units
  : T_NAMESPACE namespaced_identifier T_SEMI compilation_units
    (TYPE-TAG $2 $1 (EXPANDTAG $4) nil)
  | compilation_unit compilation_units
    (wisent-php-create-container-tag $2 $1)
  | ;; EMPTY
    (wisent-php-create-container-tag)
  ;

This may be easier, but as you were using the PHP interpreter's grammar as a reference, I tought it may be wiser to leave it unchanged and handle this case in Emacs lisp.

So I will keep going in that direction and improve my new implementation, but I also would like to hear peoples' opinion on all of this. Does someone have any remark on this subject ?

Nov 21 '15 17:11 stevenremot

I wonder if namespaces should rather emit package tags? There's some overridable functions in tag-ls that finds the proper package for a tag, though it doesn't seem much used. They are supplied with a tag and something buffer-like, so it should be strait-forward to resolve the proper namespace. The default implementation just grabs the first package tag it encounters.

Probably means that those functions should be called somewhere, though.

Dec 04 '15 22:12 xendk

I never considered package tags personally. Could you identify the way semantic handles them in its core?

As a general notice, I broke my personal desktop so I won't be able to do more than talking this month 😢

Dec 06 '15 00:12 stevenremot

Well, apparently it doesn't.semantic-tag-full-package and semantic-tag-full-name doesn't seem to be used by anything in Semantic core. I was thinking some overridden function in semantic-php might use it.

Maybe overriding semantic-analyze-dereference-metatype to use it, which could then pass it to ede-php-autolaod, to pull in the real class for completion. But that's just a random thought.

Reading the docstring of semantic-analyze-dereference-metatype, it does seem like the missing link in turning a random type tag discovered in a type hint into a real type with all the proper members. The override in c-mode also looks like it might be doing something like that. It certainly got a lot of code dealing with namespaces.

It would mean not having to deal with namespaces until they'e needed, but really can't tell whether that mean trouble or not.. The package tag does seem to make a bit more sense than a type tag, as namespaces doesn't really have members, parents nor interfaces. And namespaces can be spread over multiple files (most likely are), which confuses matters with type tags.

Well, don't sweat it, with Christmas coming up, I don't expect to get much time to stare into a screen either.

Dec 06 '15 22:12 xendk

I am working with semantic-php for 3 months and i must admit that it does well what it supports for now. I am currently maintaining a 450k LOC PHP 5.6 Symfony2/Doctrine2 code base with this tool. Apparently i could make any benefit with your braceless namespace implementation yet. I suppose it is one of the three means of milestone 1. Right?

Jan 15 '16 20:01 periklis

You are right @periklis, the benefits of this feature is not currently visible because only file-local completion is implemented. Semantic doesn't need to know the namespace of the class you are working in to provide completion for it. However, this information is important to get proper project-wide completion. For example:

namespace MyApp\Model;

class Customer {

    // ...

    public function setAddress(Address $address) {
        $this->address = $address;
    }

    // ....
}

The fully-qualified name of Address if MyApp\Model\Address. The cleanest way to determine it is to say that this type is referenced in a method that belongs to a class defined in MyApp\Database so in this case, being able to link namespace and class is important.

This explains why it is one of the three means of milestone 1 :-)

Jan 15 '16 20:01 stevenremot

Ok i am slowly catching up with you guys. Since i am interested to assist as i am an elisp-minor, i tried to recap the whole situation between the several implementations and experiments (ede-php-autoload, @trashofmasters' semantic-php, edep). As far as i could make a picture of the current state - please correct me - we are loosing ground on semantic because php-land has myriads of approaches for class-to-file resolutions.

So let's recap:

ede-php-autoload: It pushes the composer.json based resolutions into semanticdb. It seems to be parser-agnostic, as @trashofmasters demonstrated. However, using any of the current available semantic parsers (contrib/wisent-php, cedet-fork/wisent-php, trashofmasters/semantic-php or jorissteyn/semantic-php) it seems to me to be a nice helper for jumping and type-completion. However, latter only where type-inference is possible (e.g. missing @var comment-based type inference).
trashofmasters/semantic-php: According to grammar-setup.el it tries to separate namespaces and use-statements in include tags. @stevenremot suggested that this ain't gonna to work (e.g. namespace Zend\Db\ vs. Sql\Select usage)
jorissteyn/semantic-php: It is the strongest parser as far as i could test them all in my code base mentioned above (code went each version upgrade from 5.4 to 7.0 currently, class-based mainly, but in use of PHP-Templating too). It results in excellent local file semantic support.

Ergo: We have to find a way to interpret namespaces and use-statements in their various usages i a manner that semantic/semanticdb can search for us in other files for tags than our local file. If include tags aren't the way to go ([1]), what about the package tags as @xendk suggests? The semantic built-in java support seems to use them for member encapsulation check between a class, it's parent and it's package. At least it seems so if i captured (define-mode-local-override semantic-tag-protection...) right.

If neither include tags nor package tags come to our help, this will be a point to extend semantic's parser api for 'namespace or 'use tags or something like that, won't it? Otherwise, we are stuck in simple local-file support or we maybe have to make any trade-offs like composer.json-style projects as initial target only (neglects old code bases, but maybe it should ;-)).

Update: [1] As mentioned here: https://github.com/jorissteyn/edep/issues/4#issuecomment-100372861

Jan 16 '16 22:01 periklis

I would add phptags to the list. It is coupled with edep, but it plays on the same layer than ede-php-autoload and it provides FQN resolution by indexing the whole project instead of simulating autoload system. This is a different approach that can be useful in some projects, I think.

In my opinion, representing namespaces and use statements is not the real problem. The real problem is that in a PHP file, there can be a reference to an external class at any moment, not only in use statements. At the opposite, in a lot of languages, scanning import-like statements is sufficient to get all external dependencies (Python, C / C++, Java, ES6, etc...). CEDET does not seem support this runtime dependency fetching by itself, but it provides hooks to implement it (semantic-ctxt-scoped-types for example).

I looked at the CEDET documentation for the package tag. Here are the points that came to my mind:

It perfectly represents ~99.9999% of braceless namespaces cases
I don't know if it will be easy to make it represent namespace with braces
It may not be compatible with the (discouraged but legal) case of having multiple namespaces in the same file 1. I don't know if it's dramatic though.

Jan 31 '16 13:01 stevenremot

I almost fully agree with what Steven says, however, I don't understand what's the deal with package and braceless v. braced namespaces. I don't remember why and at what point I've dropped it, but I now wonder if it make sense reintroducing it?

It can't be because EXPANDFULL doesn't let us produce the package tag in the grammar, is it? Even if it were, there'd still chance of doing so in the expansion function.

Anyway, as far ar braceless namespaces are still on topic, I've used lexical analyser that match the next namespace or the end of file, to produce a S_NS_TOKEN which finally allowed for this (more or less) in the grammar:

    namespace_body:
        S_NS_SCOPE
         (MY-EXPANDFULL $1 namespace_subparts)
      | BRACE_BLOCK
         (EXPANDFULL $1 namespace_subparts)
      ;

Hardest bit was that I had to work around a "bug" in EXPANDFULL which wouldn't otherwise let other tags do expand. Perhaps I've only misconfigured the configuration of the block token, who knows.

As far as type names resolution and type inference are concerned, here are some words of notice first.

I was initially working with CEDET from Emacs 25, but because of a problem at work with Emacs 25 I had to go back to version 24. The older version of CEDET shipping in Emacs 24 misses many useful mode-local-override functions which make type inference a much easier task, whilst keeping many of the internal Semantics untouched. To make things worse CEDET seems to be transitioning all of the typecache concepts into semanticdb as they renovate the code, so there's also that.

So for the reasons above, I had to backport my semantic-php to a earlier version of CEDET, which means that in all my last experiments I had to find alternative mechanisms to customising the behaviour of the metatype and alias dereferencer functions, nonetheless I've had very good results which, if I were a better lisp programmer, would now only require breaking down and coding the name resolution algorithm.

I have found that with scoped types and a few dereferencer functions alone, the context analyser is fully capable of working with PHP. I'm following the C++ grammar, and a PHP use statement produces an include tag to hint the analyser of the type – and place with Steven's code – of the dependency, and using which maps the local type name to its fully qualified variant. The using tag also carries information about the type of type being imported, whether it be a function, a constant, a class, or namespace.

In fact I was lucky enough to reproduce autocompletion of some code like the following a few times using a clean Emacs instance without a semanticdb cache:

namespace Officine\Amaka;

use Officine\Amaka\AmakaScript\CycleDetector;
use Officine\Amaka\AmakaScript\SymbolTable;
use Officine\Amaka\Foo\Process\SymbolTable as St;

function test() 
{
    $amaka = new Amaka();
    $amaka->; // Officine\Amaka\TaskSelector

    $runner = new StandardRunner();
    $runner->;// semantic-analyze-debug-assist

    $cycles = new CycleDetector();
    $cycles->;// Officine\Amaka\AmakaScript\CycleDetector

    $st = new St();
    $st->; // Officine\Amaka\Foo\Process\SymbolTable

    $symbol = new SymbolTable();
    $symbol->; // Officine\Amaka\AmakaScript\SymbolTable
}

While in Emacs 24 it takes more code, when the semantic tags using, include, and type (of all types) are used correctly, there are only a handful of functions to override to provide the default analyser all the information it needs to:

find the type of a variable
find the definition of that type, whether that's the current buffer, a tag table, a cache file, or a file to parse
eliminate members that shouldn't complete (e.g. non-public members on objects)
autocomplete the member
determine the return type of a function, or fluent call
find all members of "this namespace"

That leaves the final stage of name resolution, determining if the name is a mis-cache or a non-existing symbol.

Roughly I have set to implement the following name resolution logic, which for obvious reasons should be configurable to allow resolution input either via overlay, or minibuffer, or a default function. Consider the following file:


namespace Hello;

use Carbon\Carbon as Date;
use DateTimeImmutable;

$d = new Date();
$d->

Semantic knows $d is of type \Carbon\Carbon and where to find the file in vendor using EDE.

$i = new DateTimeImmutable();
$i->

Semantic knows $i is of type \DateTimeImmutable. The problem is that this isn't in EDE. A solution to this could be to i) provide a set of PHP files that define all php built-in symbols, ii) write a program to compile tag tables for all php built-in symbols. Either way, Semantic will know where to load the prototype and documentation from, so it will be able to know it's not at all a missing type.

$a = new DateTime();
$a->

In this case there are three possible outcomes here based on the intended type of $a

DateTime should be Hello\DateTime, Semantic caters for this scenario,
DateTime is a symbol which can be found in the system library, if so
DateTime is perhaps something the programmer wants to use at the top, or worse
DateTime is some gibberish the programmer misspelt and should be changed

In all but the last case, I see user intervention (or desired default) as the only simple way to handle the conflict. So there's still the option of leaving it unhandled, but I like the idea of using overlays (perhaps company, or helm, or even just the minibuffer) to instruct the analyser what to think of DateTime from then on.

That would also make it harder to make mistake, or forget to import the right class.

In all scenarios that don't involve namespace at the top (or anywhere else) in the file, the same rules apply assuming the current \ namespace.

I'd love to have your input, I'm still terribly slow at going through the CEDET source code, mailing list, and fiddling about to find the time to produce code.

You think there's something worth discussing in more detail out of all this?

Feb 02 '16 21:02 metaturso

Quick thought regarding the last example: For the $a->, if 1 or 2 haven't caught it, it go to 4. For $a = new DateTime(); it should flag the DateTime as unknown, visually. In the long run, hitting some key compo over the highlighted DateTime should allow for selecting between known DateTimes and adding the right use statement. Which calls for ede-php-autoload. But for starters, it's sufficient to let the user know that it's unrecognized in the current namespace (should catch some missing backslashes on new Exception(..)

Feb 03 '16 13:02 xendk