php-mode icon indicating copy to clipboard operation
php-mode copied to clipboard

Consider using tree-sitter for syntax highlighting?

Open Gleek opened this issue 3 years ago • 9 comments

emacs-tree-sitter works by using tree-sitter grammar files to incrementally do syntax highlighting. It also has a support for php using php-grammar. Considering that the grammar files are higher type language than regex, these would provide more accurate syntax highlighting than it's regex counterparts.

In my testing it also gives a much smoother experience for large files than the default php-mode highlighting.

On my machine for large files php-syntax-propertize-hash-line-comment > (move-beginning-of-line 2) > (line-move) takes a lot of time while normal typing And php-syntax-propertize-extend-region takes minutes sometimes when adding a stray quote.

These problems don't exist on tree-sitter and disabling these problematic functions to only use tree-sitter gives about ~50ms typing latency, irrespective of the file size. This isn't real-time as well but these aren't clean benchmarks and I had other applications running in the background at the time. For reference typing latency in fundamental mode was ~25ms on the same file. For default php-mode even though the latency in small files is low large files showed about 600ms to 2.5secs. This does not include typing in quotes (') which completely freezes emacs for multiple seconds, if not a minute or two.

These are the changes I did to the php-mode function after enabling tree-sitter to get the results above.

(defun return-false(&rest _)
  "Return nil no matter what the inputs here.
Useful to override functions to become empty"
  nil)

(setq php-syntax-propertize-functions nil)
(advice-add 'php-syntax-propertize-extend-region :override #'return-false)
(remove-hook 'syntax-propertize-extend-region-functions #'php-syntax-propertize-extend-region)

I'm unaware of the feasibility or the complexity involved in integrating these two packages, but thought we can start the discussion around this, considering the benefits it might yield.

Debug info

--- PHP-MODE DEBUG BEGIN ---
versions: GNU Emacs 28.0.50 (build 2, x86_64-apple-darwin19.6.0, NS appkit-1894.60 Version 10.15.7 (Build 19H114))
 of 2021-01-31; PHP Mode 1.24.0; Cc Mode 5.35.1)
package-version: 20210310.1724
major-mode: php-mode
minor-modes: (shell-dirtrack-mode lsp-diagnostics-mode lsp-modeline-workspace-status-mode lsp-modeline-diagnostics-mode lsp-modeline-code-actions-mode lsp-ui-mode lsp-ui-doc-mode lsp-completion-mode dap-tooltip-mode dap-ui-many-windows-mode dap-ui-controls-mode dap-ui-mode treemacs-filewatch-mode treemacs-follow-mode treemacs-git-mode treemacs-fringe-indicator-mode dap-auto-configure-mode dap-mode lsp-managed-mode lsp-mode ws-butler-mode yas-minor-mode auto-insert-mode org-wild-notifier-mode ivy-rich-mode tree-sitter-hl-mode tree-sitter-mode ivy-mode smooth-scroll-mode show-paren-mode which-key-mode smartparens-mode undo-tree-mode persistent-scratch-autosave-mode save-place-mode git-gutter-mode eros-mode highlight-numbers-mode company-box-mode company-mode flycheck-posframe-mode origami-mode hl-line-mode display-line-numbers-mode whitespace-mode projectile-mode flycheck-mode subword-mode selected-minor-mode +popup-mode recentf-mode doom-modeline-mode solaire-mode key-chord-mode tooltip-mode eldoc-mode electric-indent-mode mouse-wheel-mode tab-bar-mode file-name-shadow-mode font-lock-mode auto-composition-mode auto-encryption-mode auto-compression-mode size-indication-mode column-number-mode line-number-mode transient-mark-mode abbrev-mode)
variables: ((indent-tabs-mode nil) (tab-width 4))
custom variables: ((php-executable /usr/local/bin/php) (php-site-url https://php.net/) (php-manual-url en) (php-search-url nil) (php-class-suffix-when-insert ::) (php-namespace-suffix-when-insert \) (php-default-major-mode php-mode) (php-html-template-major-mode web-mode) (php-blade-template-major-mode web-mode) (php-template-mode-alist ((\.blade . web-mode) (\.phpt\' . php-mode) (\.phtml\' . web-mode))) (php-mode-maybe-hook nil) (php-default-builtin-web-server-port 3939) (php-re-detect-html-tag php-re-detect-html-tag-default) (php-search-documentation-browser-function nil))
c-indentation-style: symfony2
c-style-variables: ((c-basic-offset 4) (c-comment-only-line-offset 0) (c-indent-comment-alist ((anchored-comment column . 0) (end-block space . 1) (cpp-end-block space . 2))) (c-indent-comments-syntactically-p t) (c-block-comment-prefix * ) (c-comment-prefix-regexp ((pike-mode . //+!?\|\**) (awk-mode . #+) (other . //+\|\**))) (c-cleanup-list (scope-operator)) (c-hanging-braces-alist ((brace-list-open) (brace-entry-open) (statement-cont) (substatement-open after) (block-close . c-snug-do-while) (extern-lang-open after) (namespace-open after) (module-open after) (composition-open after) (inexpr-class-open after) (inexpr-class-close before) (arglist-cont-nonempty))) (c-hanging-colons-alist nil) (c-hanging-semi&comma-criteria (c-semi&comma-inside-parenlist)) (c-backslash-column 48) (c-backslash-max-column 72) (c-special-indent-hook nil) (c-label-minimum-indentation 1))
c-doc-comment-style: ((java-mode . javadoc) (pike-mode . autodoc) (c-mode . gtkdoc) (c++-mode . gtkdoc))
c-offsets-alist: ((inexpr-class . 0) (inexpr-statement . +) (lambda-intro-cont . +) (inlambda . 0) (template-args-cont c-lineup-template-args +) (incomposition . +) (inmodule . +) (innamespace . +) (inextern-lang . +) (composition-close . 0) (module-close . 0) (namespace-close . 0) (extern-lang-close . 0) (composition-open . 0) (module-open . 0) (namespace-open . 0) (extern-lang-open . 0) (objc-method-call-cont c-lineup-ObjC-method-call-colons c-lineup-ObjC-method-call +) (objc-method-args-cont . c-lineup-ObjC-method-args) (objc-method-intro . [0]) (friend . 0) (cpp-define-intro c-lineup-cpp-define +) (cpp-macro-cont . +) (cpp-macro . [0]) (inclass . +) (stream-op . c-lineup-streamop) (arglist-cont-nonempty first php-lineup-cascaded-calls php-c-lineup-arglist) (arglist-cont first php-lineup-cascaded-calls 0) (comment-intro . 0) (catch-clause . 0) (else-clause . 0) (do-while-closure . 0) (access-label . -) (case-label . +) (substatement . +) (statement-case-intro . +) (statement . 0) (brace-entry-open . 0) (brace-list-entry . 0) (brace-list-close . 0) (block-close . 0) (block-open . 0) (inher-cont . c-lineup-multi-inher) (inher-intro . +) (member-init-cont . c-lineup-multi-inher) (member-init-intro . +) (annotation-var-cont . +) (annotation-top-cont . 0) (topmost-intro . 0) (knr-argdecl . 0) (func-decl-cont . +) (inline-close . 0) (class-close . 0) (class-open . 0) (defun-block-intro . +) (defun-close . 0) (defun-open . 0) (c . c-lineup-C-comments) (string . c-lineup-dont-change) (topmost-intro-cont first php-lineup-cascaded-calls +) (brace-list-intro . +) (brace-list-open . 0) (inline-open . 0) (arglist-close . php-lineup-arglist-close) (arglist-intro . php-lineup-arglist-intro) (statement-cont . php-lineup-hanging-semicolon) (statement-case-open . 0) (label . +) (substatement-label . 2) (substatement-open . 0) (knr-argdecl-intro . +) (statement-block-intro . +))
buffer: (:length 11655)

Gleek avatar Apr 07 '21 05:04 Gleek

If you use this plugin https://github.com/cjohansson/emacs-phps-mode syntax highlightning is done asynchronously and according to PHP 8.0 lex analyzer but also in pure elisp

cjohansson avatar Apr 09 '21 20:04 cjohansson

@Gleek Thank you for suggestion.

Next week is a Japanese holiday, so I'll consider other syntax highlighting issues as well.

zonuexe avatar Apr 23 '21 02:04 zonuexe

It's highly probable that https://archive.casouri.cat/note/2021/emacs-tree-sitter/ will be part of Emacs 29.

I suggest testing with that, and offering feedback if you're able.

See also https://www.reddit.com/r/emacs/comments/pxpq8d/rfc_emacs_treesitter_integration/

phil-s avatar Nov 12 '21 02:11 phil-s

I'm a bit new to the ecosystem, but my understanding is that tree-sitter will in fact be part of Emacs 29. In fact, emacs-devel is pushing to update built-in major modes for emacs 29: https://lists.gnu.org/archive/html/emacs-devel/2022-10/msg00707.html

What does php-mode need to do to prepare for compatibility w/ emacs 29? If I understand correctly, php-mode will continue to work as is, but it won't take advantage of what tree-sitter offers.

I'm a bit familiar with tree-sitter from other editors, and I'd be delighted to help out on this, if so desired.

claytonrcarter avatar Oct 10 '22 11:10 claytonrcarter

BTW one thing I recall from my past work with tree-sitter-php is that it literally only supports PHP, so any support for HTML included in a PHP file would be lost (probably not a huge deal), as would highlighting of phpDoc comments (which probably is a big deal) and anything else that's not strictly PHP.

tree-sitter handles such things by handing them off to other tree-sitter parsers via what they call "injections". These are pretty easy to work with, as I recall, but require that buffers can work highlighting via multiple modes. (Again, I'm new around here, so maybe this won't be an issue, but I see several other open issues about mmm/poly-mode, etc, so maybe it will be an issue?) Thanks again!

claytonrcarter avatar Oct 10 '22 11:10 claytonrcarter

Any updates on this? Any plans and such? Tree-sitter based parsing also plays well with packages like Combobulate.

KaranAhlawat avatar May 06 '23 06:05 KaranAhlawat

@KaranAhlawat the initial was around tree-sitter started here https://github.com/emacs-php/php-ts-mode but it's going to take some time.

piotrkwiecinski avatar May 21 '23 10:05 piotrkwiecinski

This is great news! And I also understand it takes time and effort, both of which aren't free. I'm developing a TS mode myself, for Scala.

KaranAhlawat avatar May 21 '23 14:05 KaranAhlawat