SegmentedLine
I had a need for a data structure recently and couldn't find any good package implementations. I think it might be a nice addition to this package.
Essentially, the problem was that I needed to generate some Unicode data tables. This required a structure that allowed me to model the entire 21-bit space of Unicode Code Points, and essentially "paint" regions with data I parsed from the Unicode files. The second part was that I had to optimise the data - for example, of the dozen or so Bidi_Class values, I only needed data to group scalars in to 4 categories. Certain code points would be disallowed by earlier stages of the process and could be skipped, and I wanted to combine data sets from different files in order to reduce lookups. At the same time, Unicode scalars are sparse; most of that 21-bit space is unassigned.
Anyway, I had a brief look at the various tree candidates and nothing really suited me. I don't really care so much about lookup time, since the emphasis was on creating something that could support flexible editing operations. I eventually made a simple sort of number-line, which, well, paints regions of a comparable space with a value:
var line = SegmentedLine<Int, String?>(bounds: 0..<100, value: nil)
// After setting values <5 to "small" and values >10 to "large",
// the gap is left with its previous value, "medium".
line.set(0..<20, to: "medium")
line.set(0..<5, to: "small")
line.set(10..<60, to: "large")
print(line)
// | [0..<5]: "small" | [5..<10]: "medium" | [10..<60]: "large" | [60..<100]: nil |
let string = "Bob is feeling great"
// Create a SegmentedLine for the collection's contents.
// Start by setting a font attribute over the entire string.
var tags = SegmentedLine(
bounds: string.startIndex..<string.endIndex,
value: [Font.custom("Comic Sans")] as [Any]
)
// Set each word to a different color.
// Use 'modify' to append the attribute, but only for the region
// we're modifying.
for word: Substring in string.split(separator: " ") {
tags.modify(word.startIndex..<word.endIndex) { attributes in
attributes.append(Color.random())
}
}
// Check the result.
// - ✅ Every segment still contains the font attribute.
// - ✅ Each word also contains its individual color attribute.
for (range, attributes) in tags.segments {
print(#""\#(string[range])""#, "-", attributes)
}
// "Bob" [Font.custom("Comic Sans"), Color.orange]
// " " [Font.custom("Comic Sans")]
// "is" [Font.custom("Comic Sans"), Color.green]
// " " [Font.custom("Comic Sans")]
// "feeling" [Font.custom("Comic Sans"), Color.pink]
// " " [Font.custom("Comic Sans")]
// "great" [Font.custom("Comic Sans"), Color.yellow]
// ℹ️ Imagine we have a complex SegmentedLine with lots of small segments
// capturing granular details, and we'd like to simplify it.
enum ComplexData {
case categoryA, categoryB, categoryC // ...
}
let complexLine: SegmentedLine<Int, ComplexData> = // ...
print(complexLine)
// | [0..<2]: categoryA | [2..<4]: categoryB | [4..<12]: categoryC | ...
// 1️⃣ Perhaps we can map these to a smaller number of states.
enum SimplifiedData {
case valid, invalid
}
var simplifiedLine = complexLine.mapValues { complex in
SimplifiedData(validating: complex)
}
print(simplifiedLine)
// | [0..<2]: valid | [2..<4]: valid | [4..<12]: valid | ...
// 2️⃣ Notice that we have lots of segments for boundaries which
// which are no longer important. 'combineSegments' can clean them up.
simplifiedLine.combineSegments()
print(simplifiedLine)
// | [0..<2000]: valid | [2000..<2024]: invalid | [2024..<2056]: valid | ...
Implementation: https://github.com/karwa/swift-url/blob/df08ccec114350c4bb845d28c5d5850c20521cca/Sources/UnicodeDataStructures/Shared/SegmentedLine.swift
It has been super helpful to have this thing. In particular, to do something like reduce the Bidi_Class values, there are 2 dead-simple operations to map the elements, and then to perform a kind of restartable left-fold to gather elements in to larger regions. Like this. I actually generate an indexed static data table straight from the SegmentedLine. It's pretty nifty, I think.
It's nothing particularly groundbreaking, but it has been surprisingly effective and, as I said, I couldn't find any good package solution to solve this. I think it's an interesting design space to think of operations that this kind of structure could make easier.
Cc @Azoy
BTW, I've updated the implementation as I continue to iterate on it (there were some overly-zealous precondition checks, etc). I do also have a full test suite for the version I use, which I can contribute if it's considered a fit for this package.
Also added another example - a DIY AttributedString:
/// A DIY AttributedString from a `{ Collection, RangeTable<Collection.Index, Tag?> }` pair.
///
struct MyAttributedString<Tag> {
var str: String {
// Reset all tags any time the string changes (because our indexes become invalid).
// Obviously not ideal and in a real AttributedString we wouldn't provide direct, mutable access to this.
didSet { tags = .init(bounds: str.startIndex..<str.endIndex, initialValue: nil) }
}
var tags: RangeTable<String.Index, Tag?>
init() {
str = ""
tags = .init(bounds: str.startIndex..<str.endIndex, initialValue: nil)
}
var regions: [(Substring, Tag?)] {
tags.spans.map { range, value in (str[range], value) }
}
}
var s = MyAttributedString<Int>()
s.str = "hello, world!"
s.tags.set(s.str.startIndex..<s.str.dropFirst(5).startIndex, to: 42)
s.tags.set(s.str.dropLast(5).endIndex..<s.str.endIndex, to: -1)
let expected: [(Substring, Int?)] = [
("hello", 42),
(", w", nil),
("orld!", -1),
]
assert(s.regions.elementsEqual(expected, by: { $0.0 == $1.0 && $0.1 == $1.1 }))
s.regions.forEach { print($0) }
Dealing with mutations in such a type relies a lot on how the collection's indexes behave, but it's still useful as a generic utility even if the collection is immutable (you can still tag its data!).
Other interesting uses are timelines - tagging regions of time with "appointments" or other time-centric data, and traversing and manipulating that timeline using its natural coordinates.
@lorentey I added a modify operation and some better examples to the original post, along with a link to the finished implementation (now called SegmentedLine). There are tests in the package, too, of course.