SyntaxKit
SyntaxKit copied to clipboard
Improved parser and other stuff
This should fix #5 and make the framework usable for production. Uses or closes #5, #10, #11, #15 and #16.
Major changes
- It works, parsing actually produces results that are close to what TextMate does.
- Supports incremental parsing with an NSOperation subclass.
- The public interface to get languages and themes has changed.
- Updated for Swift 3, Xcode 8 and the Swift API Design Guidelines
- It copies the Color code from the X dependency (see #11 from alimoeeny).
- Oh, and I changed tabs to spaces (soft tabs).
The last two changes might be a bit polarizing. The reationale behind them is to make it easier for new people to contribute. Setting up the environment for a new project is always a big hurdle for adoption. If you want to invite people to play around with your framework, make it as simple as possible to go from git clone to a project that builds (and where tests pass). Like this it builds out of the box with no dependencies. Also, the current default in Xcode is soft tabs (4 spaces), so a new user will not mess up the spacing if he uses a clean install of Xcode.
Caveats
There are however a few things that still don't work as expected. Feel free to improve upon this.
Due to differences between Onigurama and NSRegularExpression:
-
\G
is not supported (for performance reasons the parser does not guarantee to match the end of a pattern right after begin, and even if it would,\G
behaves wierdly with NSRegularExpression). This might be fixable. - NSRegularExpression seems to require a few more escape characters in certain places.
- Backreferences from
end
tobegin
like\1
are not supported (I would like to know how Texmate does this).
Not implemented:
- The contentName property is ignored.
- Attributes other than foreground color are ignored.
- Cannot recursively include itself (use $self or $base instead, would require extra checks in the
BundleManager
). Fixable with extra logic.
Notes
Since I don't use Carthage or CocoaPods, some work probably still has to be done that is plays well with them. Any help would be appreciated.
Could you show us how to do the incremental parsing with AttributedParsingOperation?
Sure, if you are familiar with NSOperation and NSOperationQueue it should be pretty straight forward.
Just use the first constructor for the first operation on a file and for subsequent changes instantiate the operation with init(sting: , previousOperation: , changeIsInsertion: , changedRange:)
.
Something like this.
let input = "title: \"Hello World\"\n"
let firstOperation = AttributedParsingOperation(string: input,
language: yaml,
theme: tomorrow,
callback: updateTextView)
myOperationQueue.addOperation(firstOperation)
let newInput = "author: \"Me\"\ntitle: \"Hello World\"\n"
let secondOperation = AttributedParsingOperation(string: newInput,
previousOperation: firstOperation,
changeIsInsertion: true,
changedRange: NSRange(location: 0, length: 13))
myOperationQueue.addOperation(secondOperation)
Ya, I am trying to apply them into the NSTextStorage. I try to put the code in the following method (NSTextStorageDelegate) but the text view is stuck when I type into it.
func textStorage(textStorage: NSTextStorage, didProcessEditing editedMask: NSTextStorageEditActions, range editedRange: NSRange, changeInLength delta: Int) {
I am very sorry but would you gave me the clue where do I have to put those methods?
Did you do the updates on the main thread?
@alehed would you mind explaining how to load the .tmLanguage and .tmTheme files to the BundleManager? Unfortunately, the README.md doesn't make this very obvious. Do I have to add the info.plist for the tm files as well? Thanks!
Let's say the files are in /Users/demo/tmfiles/
and they all have a .xml
extension (instead of .tmLanguage and .tmTheme).
As explained in the Readme, to get the Language and Theme classes you first have to get a BundleManager class. You can either create your own BundleManager object and pass it around or you can just use the default manager which is accessed using BundleManager.defaultManager
.
Now to initialize a manager (either your own or the default one), you have to give the class a callback that tells it where the files live. In our case you could initialize the default manager using:
BundleManager.initializeDefaultManager(with: { identifier, isLanguage in
let base = Url(fileURLWithPath: "/Users/demo/tmfiles/")
return base.appendingPathComponent("\(identifier).xml")
})
Now of course in the real world you would use NSBundle to compute the location of the files, but I hope you get the big picture. Basically the callback you use to initialize the BundleManager takes the name and type of the file and returns a url pointing to the requested file or nil if some error occurred.
Also, you only need the actual files containing the themes and languages, no additional plists.
@alehed I appreciate the breakdown, I'll give it a go and see if I can get it working. Thanks again for the time!
An example project for iOS would be awesome. I can't seem to get it working no matter what I try, and this is a must have for an app I'm working on.
Did you take a look at TRex? It's written in swift 2 and uses an old SyntaxKit version, but apart from the BundleManager described above these changes here should be backwards compatible.
You can then still go from there and use the more advanced features like incremental parsing.
@jq1106, using the details provided by @alehed, here's a code snippet that worked for me (granted, this is just test code with no abstractions or extensibility in mind):
import UIKit
import SyntaxKit
class ViewController: UIViewController {
enum TmType: String {
case Swift
case Tomorrow = "Tomorrow-Night-Bright"
var extensionType: String {
switch self {
case .Swift:
return "tmLanguage"
case .Tomorrow:
return "tmTheme"
}
}
}
@IBOutlet weak var textView: UITextView!
override func viewDidLoad() {
super.viewDidLoad()
let manager = BundleManager { (identifier, isLanguage) -> (URL?) in
guard let type = TmType(rawValue: identifier) else { return nil }
return Bundle.main.url(forResource: type.rawValue,
withExtension: type.extensionType)
}
let yaml = manager.language(withIdentifier: "Swift")!
let tomorrow = manager.theme(withIdentifier: "Tomorrow-Night-Bright")!
let attributedParser = AttributedParser(language: yaml, theme: tomorrow)
let input = "func do() -> String {\n return \"EUREKA\"\n}"
let attText = attributedParser.attributedString(for: input)
textView.attributedText = attText
}
}
@vhart Thank you so much, finally it works and looks great :)
@vhart Just one more question.
Some of the language packs from textmate have .plist instead of .tmLanguage, should I just rename them for them to work?
It doesn't matter which extension the files have, as long as you can calculate their URL.
I did one more small change. I figured it would be more flexible to have the file type be specified in an enum, not a Bool.
Hey, so I'm seeing some problems with the highlighting of objective c using the textmate .tmlanguage file, however it highlight just fine in textmate. So I'm a little confused.
Screenshots: iOS - https://lfil.es/i/6561cb4b TextMate - https://lfil.es/i/c11e7639
Any idea why this could be happening?
Does it print any warnings? Sometimes you have to adjust the regexes a bit for it to work with NSRegularExpression. Especially '{' is sometimes not quoted in .tmLanguage files.
And one more thing: the objective c grammar directly depends on the c grammar, so you need to have that installed too.
But how do I have it use more than one tmLanguage files at the same time o.O
This is the most confusing thing I've ever used.
It's included automatically, just make sure the callback finds it.
These are the warnings it prints: https://lfil.es/p/b08e87ad
You can try to fix the problems with the regexes if you want to (see the caveats above), but mostly it should highlight fairly well even with warnings.
Yeah everything else highlights good enough besides the objective c one. I wouldn't know where to even start with such complex regexes that are in the tmLanguage files, so I suppose I'll just leave it for now or try to look for different versions of objc ones.
It would be awesome if this could be merged.
In the meantime you can just use the fork.
@soffes any plans on merging this in?
Backreferences from end to begin like \1 are not supported (I would like to know how Texmate does this).
Any progress on this?
Nope, I decided the current state was good enough (backreferences are usually used to recognize languages embedded within other languages, which usually slows parsing down quite a bit). Also for this you would have to match the begin and end part at the same time which is kind of tricky to implement.
If you want to have a go at this, PR's are always welcome.
Yes, of course. Lua uses this syntax:
(Multiline Comment) begin: --[(=*)[ end: ]\1]
(Multiline String) begin: [(=*)[ end: ]\1]
I am working on an iOS Lua Editor, your work really helped me a lot!
You are welcome!
Yes, this is another use of back-references. I see how this can be useful.
If you want to implement this, your best bet is probably to combine the begin
and end
expressions with a .*
and match that instead of just the begin expression. For this you would have to take a look at Pattern.swift
and Parser.swift
.
@alehed I've got a question (or five), partner!
When using AttributedParsingOperation
to perform incremental parsing, it seems as though there are only two options: insertion or deletion. How might one go about handling the ”replacement” of text? There are three replacement situations that I continually run into where I'm not sure how best to handle parsing and/or re-parsing.
For instance, in all of the examples below, assume the following is true:
let source = "Hello, World!"
let replacementRange = NSRange(location: 7, length: 6) // "World!"
Given source
is the target string and replacementRange
is the range to be replaced within the target string, the possible replacement situations are:
-
Exact replacement: The replacement string is exactly the length of the range being replaced. Thus, the change in length between the strings is 0 (i.e. no characters have been “deleted” or “inserted” — in the traditional sense of the words).
let replacement = "There!" let result = source.replacingCharacters(in: replacementRange, with: replacement) print("\(source) => \(result) (delta: \(result.length - source.length))") // ~> Hello, World! => Hello, There! (delta: 0)
-
Additive replacement: The replacement string is longer than the range being replaced. Thus, the change in length between the strings is a positive value (i.e. characters have been “inserted”).
let replacement = "My Dudes and Dudettes!" let result = source.replacingCharacters(in: replacementRange, with: replacement) print("\(source) => \(result) (delta: \(result.length - source.length))") // ~> Hello, World! => Hello, My Dudes and Dudettes! (delta: 16)
-
Subtractive replacement: The replacement string is shorter than the range being replaced. Thus, the change in length between the strings is a negative value (i.e. characters have been “deleted”).
let replacement = "Pal!" let result = source.replacingCharacters(in: replacementRange, with: replacement) print("\(source) => \(result) (delta: \(result.length - source.length))") // ~> Hello, World! => Hello, Pal! (delta: -2)
In these situations, does one "break down" the replacement operation into its sub-operations? What about the first case, where the string has had no change in length? Does one re-parse the string in its entirety?
As far as background information goes, I simply have a custom NSTextView
subclass that processes the changes as they happen. In shouldChangeText(in:replacementString:)
, I record the change (if the change should be allowed), then on didChangeText()
, I enqueue the parsing operation in a custom queue like you suggested (e.g. chaining the previous operation to the next, or if no operations have been performed, creating the initial operation with the theme, language, etc.). Everything works great until I want to edit the text storage in some other method (like to implement convertSpacesToTabs()
or something similar). I never know which range(s) should be considered “dirty.”
If none of this makes sense, please let me know, and I'll try my best to expound on or simplify my question(s). Basically, I just want to know how best to handle the “replacement” aspect of the incremental parsing operation, rather than raw insertion/deletion scenarios.
Thanks!
P.S. Your changes to the base SyntaxKit are badass, and they've been a great help to me!