SyntaxKit icon indicating copy to clipboard operation
SyntaxKit copied to clipboard

Improved parser and other stuff

Open alehed opened this issue 8 years ago • 33 comments

This should fix #5 and make the framework usable for production. Uses or closes #5, #10, #11, #15 and #16.

Major changes

  • It works, parsing actually produces results that are close to what TextMate does.
  • Supports incremental parsing with an NSOperation subclass.
  • The public interface to get languages and themes has changed.
  • Updated for Swift 3, Xcode 8 and the Swift API Design Guidelines
  • It copies the Color code from the X dependency (see #11 from alimoeeny).
  • Oh, and I changed tabs to spaces (soft tabs).

The last two changes might be a bit polarizing. The reationale behind them is to make it easier for new people to contribute. Setting up the environment for a new project is always a big hurdle for adoption. If you want to invite people to play around with your framework, make it as simple as possible to go from git clone to a project that builds (and where tests pass). Like this it builds out of the box with no dependencies. Also, the current default in Xcode is soft tabs (4 spaces), so a new user will not mess up the spacing if he uses a clean install of Xcode.

Caveats

There are however a few things that still don't work as expected. Feel free to improve upon this.

Due to differences between Onigurama and NSRegularExpression:

  • \G is not supported (for performance reasons the parser does not guarantee to match the end of a pattern right after begin, and even if it would, \G behaves wierdly with NSRegularExpression). This might be fixable.
  • NSRegularExpression seems to require a few more escape characters in certain places.
  • Backreferences from end to begin like \1 are not supported (I would like to know how Texmate does this).

Not implemented:

  • The contentName property is ignored.
  • Attributes other than foreground color are ignored.
  • Cannot recursively include itself (use $self or $base instead, would require extra checks in the BundleManager). Fixable with extra logic.

Notes

Since I don't use Carthage or CocoaPods, some work probably still has to be done that is plays well with them. Any help would be appreciated.

alehed avatar Aug 07 '16 16:08 alehed

Could you show us how to do the incremental parsing with AttributedParsingOperation?

jasonnam avatar Sep 01 '16 06:09 jasonnam

Sure, if you are familiar with NSOperation and NSOperationQueue it should be pretty straight forward.

Just use the first constructor for the first operation on a file and for subsequent changes instantiate the operation with init(sting: , previousOperation: , changeIsInsertion: , changedRange:).

Something like this.

let input = "title: \"Hello World\"\n"
let firstOperation = AttributedParsingOperation(string: input,
                                              language: yaml,
                                                 theme: tomorrow,
                                              callback: updateTextView)

myOperationQueue.addOperation(firstOperation)

let newInput = "author: \"Me\"\ntitle: \"Hello World\"\n"
let secondOperation = AttributedParsingOperation(string: newInput,
                                      previousOperation: firstOperation,
                                      changeIsInsertion: true,
                                           changedRange: NSRange(location: 0, length: 13))

myOperationQueue.addOperation(secondOperation)

alehed avatar Sep 01 '16 08:09 alehed

Ya, I am trying to apply them into the NSTextStorage. I try to put the code in the following method (NSTextStorageDelegate) but the text view is stuck when I type into it.

  func textStorage(textStorage: NSTextStorage, didProcessEditing editedMask: NSTextStorageEditActions, range editedRange: NSRange, changeInLength delta: Int) {

I am very sorry but would you gave me the clue where do I have to put those methods?

jasonnam avatar Sep 01 '16 09:09 jasonnam

Did you do the updates on the main thread?

alehed avatar Sep 01 '16 09:09 alehed

@alehed would you mind explaining how to load the .tmLanguage and .tmTheme files to the BundleManager? Unfortunately, the README.md doesn't make this very obvious. Do I have to add the info.plist for the tm files as well? Thanks!

vhart avatar Nov 09 '16 15:11 vhart

Let's say the files are in /Users/demo/tmfiles/ and they all have a .xml extension (instead of .tmLanguage and .tmTheme).

As explained in the Readme, to get the Language and Theme classes you first have to get a BundleManager class. You can either create your own BundleManager object and pass it around or you can just use the default manager which is accessed using BundleManager.defaultManager.

Now to initialize a manager (either your own or the default one), you have to give the class a callback that tells it where the files live. In our case you could initialize the default manager using:

BundleManager.initializeDefaultManager(with: { identifier, isLanguage in
    let base = Url(fileURLWithPath: "/Users/demo/tmfiles/")
    return base.appendingPathComponent("\(identifier).xml")
})

Now of course in the real world you would use NSBundle to compute the location of the files, but I hope you get the big picture. Basically the callback you use to initialize the BundleManager takes the name and type of the file and returns a url pointing to the requested file or nil if some error occurred.

alehed avatar Nov 09 '16 15:11 alehed

Also, you only need the actual files containing the themes and languages, no additional plists.

alehed avatar Nov 09 '16 15:11 alehed

@alehed I appreciate the breakdown, I'll give it a go and see if I can get it working. Thanks again for the time!

vhart avatar Nov 09 '16 16:11 vhart

An example project for iOS would be awesome. I can't seem to get it working no matter what I try, and this is a must have for an app I'm working on.

eskimo avatar Nov 11 '16 04:11 eskimo

Did you take a look at TRex? It's written in swift 2 and uses an old SyntaxKit version, but apart from the BundleManager described above these changes here should be backwards compatible.

You can then still go from there and use the more advanced features like incremental parsing.

alehed avatar Nov 11 '16 05:11 alehed

@jq1106, using the details provided by @alehed, here's a code snippet that worked for me (granted, this is just test code with no abstractions or extensibility in mind):


import UIKit
import SyntaxKit

class ViewController: UIViewController {

    enum TmType: String {
        case Swift
        case Tomorrow = "Tomorrow-Night-Bright"

        var extensionType: String {
            switch self {
            case .Swift:
                return "tmLanguage"
            case .Tomorrow:
                return "tmTheme"
            }
        }
    }

    @IBOutlet weak var textView: UITextView!
    override func viewDidLoad() {
        super.viewDidLoad()
        let manager = BundleManager { (identifier, isLanguage) -> (URL?) in
            guard let type = TmType(rawValue: identifier) else { return nil }
            return Bundle.main.url(forResource: type.rawValue,
                                   withExtension: type.extensionType)
        }
        let yaml = manager.language(withIdentifier: "Swift")!
        let tomorrow = manager.theme(withIdentifier: "Tomorrow-Night-Bright")!
        let attributedParser = AttributedParser(language: yaml, theme: tomorrow)

        let input = "func do() -> String {\n    return \"EUREKA\"\n}"
        let attText = attributedParser.attributedString(for: input)
        textView.attributedText = attText
    }
}

vhart avatar Nov 11 '16 13:11 vhart

@vhart Thank you so much, finally it works and looks great :)

eskimo avatar Nov 11 '16 18:11 eskimo

@vhart Just one more question.

Some of the language packs from textmate have .plist instead of .tmLanguage, should I just rename them for them to work?

eskimo avatar Nov 11 '16 19:11 eskimo

It doesn't matter which extension the files have, as long as you can calculate their URL.

I did one more small change. I figured it would be more flexible to have the file type be specified in an enum, not a Bool.

alehed avatar Nov 11 '16 20:11 alehed

Hey, so I'm seeing some problems with the highlighting of objective c using the textmate .tmlanguage file, however it highlight just fine in textmate. So I'm a little confused.

Screenshots: iOS - https://lfil.es/i/6561cb4b TextMate - https://lfil.es/i/c11e7639

Any idea why this could be happening?

eskimo avatar Nov 19 '16 00:11 eskimo

Does it print any warnings? Sometimes you have to adjust the regexes a bit for it to work with NSRegularExpression. Especially '{' is sometimes not quoted in .tmLanguage files.

alehed avatar Nov 19 '16 07:11 alehed

And one more thing: the objective c grammar directly depends on the c grammar, so you need to have that installed too.

alehed avatar Nov 19 '16 08:11 alehed

But how do I have it use more than one tmLanguage files at the same time o.O

This is the most confusing thing I've ever used.

eskimo avatar Nov 19 '16 17:11 eskimo

It's included automatically, just make sure the callback finds it.

alehed avatar Nov 19 '16 17:11 alehed

These are the warnings it prints: https://lfil.es/p/b08e87ad

eskimo avatar Nov 19 '16 17:11 eskimo

You can try to fix the problems with the regexes if you want to (see the caveats above), but mostly it should highlight fairly well even with warnings.

alehed avatar Nov 19 '16 17:11 alehed

Yeah everything else highlights good enough besides the objective c one. I wouldn't know where to even start with such complex regexes that are in the tmLanguage files, so I suppose I'll just leave it for now or try to look for different versions of objc ones.

eskimo avatar Nov 19 '16 17:11 eskimo

It would be awesome if this could be merged.

Eitot avatar May 25 '17 23:05 Eitot

In the meantime you can just use the fork.

alehed avatar May 26 '17 04:05 alehed

@soffes any plans on merging this in?

DivineDominion avatar Aug 07 '17 08:08 DivineDominion

Backreferences from end to begin like \1 are not supported (I would like to know how Texmate does this).

Any progress on this?

Lessica avatar Aug 14 '17 17:08 Lessica

Nope, I decided the current state was good enough (backreferences are usually used to recognize languages embedded within other languages, which usually slows parsing down quite a bit). Also for this you would have to match the begin and end part at the same time which is kind of tricky to implement.

If you want to have a go at this, PR's are always welcome.

alehed avatar Aug 14 '17 17:08 alehed

Yes, of course. Lua uses this syntax:

(Multiline Comment) begin: --[(=*)[ end: ]\1]

(Multiline String) begin: [(=*)[ end: ]\1]

I am working on an iOS Lua Editor, your work really helped me a lot!

Lessica avatar Aug 14 '17 18:08 Lessica

You are welcome!

Yes, this is another use of back-references. I see how this can be useful.

If you want to implement this, your best bet is probably to combine the begin and end expressions with a .* and match that instead of just the begin expression. For this you would have to take a look at Pattern.swift and Parser.swift.

alehed avatar Aug 14 '17 18:08 alehed

@alehed I've got a question (or five), partner!

When using AttributedParsingOperation to perform incremental parsing, it seems as though there are only two options: insertion or deletion. How might one go about handling the ”replacement” of text? There are three replacement situations that I continually run into where I'm not sure how best to handle parsing and/or re-parsing.

For instance, in all of the examples below, assume the following is true:

let source = "Hello, World!"
let replacementRange = NSRange(location: 7, length: 6) // "World!"

Given source is the target string and replacementRange is the range to be replaced within the target string, the possible replacement situations are:

  • Exact replacement: The replacement string is exactly the length of the range being replaced. Thus, the change in length between the strings is 0 (i.e. no characters have been “deleted” or “inserted” — in the traditional sense of the words).

    let replacement = "There!"
    let result = source.replacingCharacters(in: replacementRange, with: replacement)
    print("\(source) => \(result) (delta: \(result.length - source.length))") 
    // ~> Hello, World! => Hello, There! (delta: 0)
    
  • Additive replacement: The replacement string is longer than the range being replaced. Thus, the change in length between the strings is a positive value (i.e. characters have been “inserted”).

    let replacement = "My Dudes and Dudettes!"
    let result = source.replacingCharacters(in: replacementRange, with: replacement)
    print("\(source) => \(result) (delta: \(result.length - source.length))") 
    // ~> Hello, World! => Hello, My Dudes and Dudettes! (delta: 16)
    
  • Subtractive replacement: The replacement string is shorter than the range being replaced. Thus, the change in length between the strings is a negative value (i.e. characters have been “deleted”).

    let replacement = "Pal!"
    let result = source.replacingCharacters(in: replacementRange, with: replacement)
    print("\(source) => \(result) (delta: \(result.length - source.length))") 
    // ~> Hello, World! => Hello, Pal! (delta: -2)
    

In these situations, does one "break down" the replacement operation into its sub-operations? What about the first case, where the string has had no change in length? Does one re-parse the string in its entirety?

As far as background information goes, I simply have a custom NSTextView subclass that processes the changes as they happen. In shouldChangeText(in:replacementString:), I record the change (if the change should be allowed), then on didChangeText(), I enqueue the parsing operation in a custom queue like you suggested (e.g. chaining the previous operation to the next, or if no operations have been performed, creating the initial operation with the theme, language, etc.). Everything works great until I want to edit the text storage in some other method (like to implement convertSpacesToTabs() or something similar). I never know which range(s) should be considered “dirty.”

If none of this makes sense, please let me know, and I'll try my best to expound on or simplify my question(s). Basically, I just want to know how best to handle the “replacement” aspect of the incremental parsing operation, rather than raw insertion/deletion scenarios.

Thanks!

P.S. Your changes to the base SyntaxKit are badass, and they've been a great help to me!

benstockdesign avatar Sep 26 '17 12:09 benstockdesign