hwrt Segmentation

Segmentation

Open MartinThoma opened this issue 9 years ago • 22 comments

This is an example for segmentation:

Each symbol is in one color, different colors mean different symbols.

Features

A classifier which has to decide for two given strokes if they are in one symbol could use the following features:

[x] Features of the strokes (20 point coordinates per stroke, re-curvature, ink)
[x] Stroke distance (0 means they intersect)
[x] Time distance
[x] strokes in between (by time)
[ ] strokes in between (by geometry)
[ ] at most 4 strokes in a symbol?
[ ] symbols in between - only works if we are testing for a given segmentation

Problems

Delayed strokes (for correction / enhancement)
cursive handwriting

Apr 06 '15 13:04 MartinThoma

Segmentation by strokes seems to be possible for mathematical context.

Be aware that the order of strokes is not necessarily ordered (e.g. enlarging fraction strokes)

Apr 06 '15 15:04 MartinThoma

For http://www.martin-thoma.de/write-math/view/?raw_data_id=151305 the classifier predicted the segmentation [[0,4],[1,2,5,6],[3]]:

The yellow and black segmentation does not make any sense, as it would mean there is a symbol between strokes of one symbol. This should be part of the score of a segmentation.

Apr 12 '15 13:04 MartinThoma

153892

Real segmentation: [[0, 2, 16], [1], [3, 4], [5, 6], [7, 8], [9, 10, 11], [12, 13], [14, 15]] (got at place -1) Predict segmentation: [[0, 4, 6], [1, 2, 3, 5], [7], [8, 10, 11], [9, 15], [12, 13, 14], [16]] (0.00000000)

Apr 12 '15 13:04 MartinThoma

[157, 2, 498, 59, 14, 33, 4, 4, 3, 1, 4, 14, 13, 0, 0, 2, 11, 277, 1, 1, 1, 1, 5, 4, 74, 1, 1, 1, 4, 2, 14, 18, 97, 1, 12, 2, 0, 1, 11, 2, 49, 1, 197, 12, 49, 2, 13, 360, 6, 5, 14, 21, 5, 15, 3, 7, 0, 9, 326, 123, 39, 7, 1, 3, 23, 4, 1, 12, 264, 63, 14, 4, 1, 2, 3, 6, 146, 8, 0, 13, 406, 3, 0, 1, 1, 3, 157, 0, 0, 158, 46, 6, 1, 33, 2, 1, 1, 0, 0, 4, 1, 4, 1, 0, 1, 5, 3, 4, 3, 0, 10, 1, 73, 11, 0, 1, 26, 9, 3, 0, 240, 1, 2, 27, 7, 1, 4, 8, 105, 0, 306, 98, 0, 1, 10, 57, 122, 0, 12, 2, 0, 485, 0, 29, 2, 17, 1, 2, 1, 312, 1, 44, 10, 13, 0, 23, 1, 37, 406, 4, 0, 2, 2, 4, 3, 0, 4, 3, 9, 0, 0, 2, 10, 20, 4, 21, 105, 0, 3, 0, 19, 0, 88, 3, 1, 3, 0, 1, 1, 13, 1, 20, 0, 1, 4, 5, 22, 1, 1, 36, 7, 0, 1, 2, 85, 38, 1, 9, 62, 2, 3, 1, 25, 6, 0, 1, 1, 0, 1, 0, 26, 9, 0, 127, 0, 0, 0, 2, 1, 10, 0, 9, 0, 7, 1, 0, 12, 2, 0, 60, 15, 0, 155, 0, 0, 14, 7, 2, 1, 0, 1, 0, 1, 15, 0, 0, 3, 12, 11, 23, 34, 0, 1, 1, 0, 2, 1, 189, 84, 0, 10, 0, 4, 3, 0, 0, 450, 0, 12, 163, 10, 3, 7, 1, 0, 4, 0, 113, 3, 49, 0, 1, 26, 1, 0, 4, 3, 49, 11, 1, 0, 0, 1, 95, 1, 76, 76, 1, 1, 1, 4, 12, 0, 0, 0, 0, 30, 17, 2, 0, 16, 15, 497, 13, 360, 181, 0, 1, 49, 54, 13, 1, 4, 2, 45, 1, 7, 1, 4, 0, 0, 2, 2, 4, 114, 8, 1, 3, 199, 1, 1, 0, 7, 0, 2, 212, 9, 10, 178, 1, 31, 1, 1, 41, 0, 0, 207, 1, 1, 173, 4, 6, 4, 1, 4, 1, 65, 0, 0, 3, 1, 184, 1, 8, 1, 5, 4, 0, 13, 10, 1, 1, 148, 2, 42, 2, 7, 1, 16, 0, 49, 1, 38, 1, 289, 3, 14, 37, 0, 0, 24, 133, 14, 4, 0, 8, 0, 0, 47, 1, 4, 1, 1, 3, 43, 104, 2, 0, 1, 28, 2, 379, 12, 57, 1, 79, 3, 7, 14, 499, 4, 2, 1, 241, 0, 0, 2, 0, 1, 0, 0, 3, 178, 52, 2, 0, 156, 269, 0, 0, 2, 3, 0, 4, 0, 51, 60, 166, 0, 0, 1, 11, 6, 3, 5, 77, 0, 0, 7, 0, 10, 4, 25, 0, 5, 96, 0, 317, 85, 7, 101, 0, 19, 332, 23, 13, 116, 37, 13, 1, 13, 0, 1, 26, 192, 7, 4, 50, 1, 2, 0, 0, 1, 86, 2, 127, 347, 0, 16, 0, 3, 7, 223, 309, 13, 36, 0, 1, 1, 1, 1, 1, 3, 4, 258, 0, 384, 0, 11, 1, 495, 317, 4, 2, 9, 202, 9, 87, 3, 265, 6, 146, 4, 173, 12, 1, 4, 103, 21, 33, 3, 13, 0, 352, 330, 51, 334, 3, 6, 0, 34, 1, 0, 0, 6, 1, 8, 8, 9, 9, 9, 0, 1, 1, 191, 1, 1, 0, 10, 22, 8, 175, 0, 1, 0, 14, 376, 1, 24, 17, 1, 26, 164, 1, 0, 5, 27, 1, 196, 0, 0, 31, 11, 494, 0, 0, 4, 3, 0, 13, 10, 47, 3, 0, 386, 11, 3, 0, 0, 0, 3, 3, 8, 0, 1, 2, 0, 30, 4, 10, 3, 10, 304, 33, 0, 1, 1, 1, 12, 0, 39, 1, 0, 0, 0, 0, 0, 1, 0, 3, 36, 0, 3, 20, 11, 4, 0, 0, 194, 347, 98, 0, 140, 445, 0, 3, 0, 3, 2, 182, 0, 1, 2, 4, 20, 1, 167, 13, 1, 0, 1, 0, 200, 0, 4, 12, 0, 68, 14, 3, 0, 0, 0, 50, 0, 4, 10, 35, 49]
2015-04-12 19:49:54,171 INFO mean: 37.82
2015-04-12 19:49:54,172 INFO median: 3.00
2015-04-12 19:49:54,172 INFO TOP-1: 0.16
2015-04-12 19:49:54,172 INFO TOP-3: 0.33
2015-04-12 19:49:54,173 INFO TOP-10: 0.48
2015-04-12 19:49:54,173 INFO TOP-20: 0.56
2015-04-12 19:49:54,173 INFO TOP-50: 0.63
2015-04-12 19:49:54,173 INFO Out of order: 72
2015-04-12 19:49:54,173 INFO Total: 943

Apr 12 '15 19:04 MartinThoma

Idea:

Analyze data

Search single-stroke symbols which are not "prefixes" of other symbols. For example. o is a prefix of \oplus or l is a prefix of t; 1 and 3 are prefixes of B. Those symbols should also be "stable" in the sense that their variance in stroke count is low. Let B be the group of those basic symbols. Examples of this group might be \sum, \int, (, ), \infty, 2, 6, 8, 9 0, a, c, e, ...

Results

91 of 374 symbols have a mean stroke count in [1,1.1] with a variance < 0.3. The variance is very likely caused by wild points.

Segment

Let L be the list of strokes which should be partitioned into a list of list of strokes, where each sublist is one symbol.

In order to segment a recordings, analyze each stroke as a single-stroke symbol. Try to find if a stoke is a member of B. Those members of B are called L_B ⊆ L. Let L' = L \ L_B.
Hard chunking ("hard" because there is nothing probabilistic):
1. Find the biggest symbol (by bounding box, separately for width and height)
2. Strokes whichs bounding boxes are more than α× max_width in width or more than α× max_height away belong to separate chunks. - α should be about 2 from my gut-feeling.
3. Does any chunk have more than 8 strokes? → chunk it again (how? with smaller α?)
Now each chunk is guaranteed to have at most 8 strokes. For each chunk:
1. The 2-stroke classifier (belongs to the same symbol or not) can use new features
  - "symbols in between"
  - mean / median / min / max dimensions (with / height) of symbols in L_B
  - width / height of the bounding box of the symbol if the two strokes belonged to to the same symbol to find a segmentation of L'.
2. Segment chunks as before

Apr 14 '15 05:04 MartinThoma

Idea: Use MST as described in http://www.ai.mit.edu/projects/natural-log/papers/matsakis-MEng-99.pdf#page=37

Each node is the center of the bounding box of a stroke.
A segmentation of the strokes equals a segmentation of the minimum spanning tree of the fully connected graph given by the node centers → less possibilities to segment and segmentations make more sense

Apr 14 '15 06:04 MartinThoma

Ideas:

Go through each stroke, classify it. If the net is sure enough (gives a probability over a threshold, e.g. > 95%), the recording will be segmented there
Problem: ħ is essentially a h, but the bar is added later
Fix: check if the stroke crosses other strokes
Problem: : and . (j, i are not so much a problem, because there is no symbol with with an j without dot can get confused)

May 10 '15 08:05 MartinThoma

Single Stroke combinational recognizer

Classify symbols stroke-by-stroke. Lets say there are n symbols in total.

Classify each stroke. Remember the scaling, x- and y- translation factors (3 factors)
Train a two-stroke combination recognizer which takes 2*(n+3) input features and gives the probabilities for n classes. Problem: No way to remember which strokes were already drawn. Good: Stroke enhancements are probably not that bad.

Jun 06 '15 06:06 MartinThoma

Stream segmentation

Start classifying when the first stroke arrives
Situation: You have the complete list of all entered strokes complete_data, the new entered stroke new_data, the old list of strokes old_data (old_data+new_data = complete_data) as well as a list of classification hypotheses old_segmentation_hyp (which is a list of dictionaries with keys probability and segmentation. segmentation is a list of dictionaries. Those dictionaries contain the keys symbol, strokes and probability)

Now one of two things could have happened:

The stroke belongs to a new symbol
The stroke belongs to a symbol which is already there

For new symbols, it is more likely that it is at the very right side of the bounding box of complete_data. It is also very likely that the new stroke did not cross any old stroke.

If the new stroke belongs to an existing symbol, then then it has to be "next" to the symbol. "next" in this context means there is not another symbol in between (though it might be around, e.g. sqrt).

Jun 06 '15 06:06 MartinThoma

## 325119
  Real segmentation:    [[0, 1], [2, 3]] (got at place -1)
  Predict segmentation: [[0], [1], [2, 3]] (1.00000000)
  Segmentation took 0.0561 seconds.
  over-segmented
## 325266
  Real segmentation:    [[0], [1, 2], [3], [4, 5]] (got at place -1)
  Predict segmentation: [[0], [1, 2], [3], [4], [5]] (1.00000000)
  Segmentation took 0.0890 seconds.
  over-segmented
## 325283
  Real segmentation:    [[0, 1], [2]] (got at place -1)
  Predict segmentation: [[0, 2], [1]] (1.00000000)
  Segmentation took 0.0451 seconds.
  over-segmented
  under-segmented
## 325540
  Real segmentation:    [[0], [1, 2], [3, 4], [5], [6]] (got at place -1)
  Predict segmentation: [[0, 6], [1, 2, 3], [4], [5]] (1.00000000)
  Segmentation took 0.0978 seconds.
  over-segmented
  under-segmented
## 325545
  Real segmentation:    [[0], [1], [2, 3], [4], [5], [6, 7], [8]] (got at place -1)
  Predict segmentation: [[0], [1, 2], [3], [4], [5], [6], [7, 8]] (1.00000000)
  Segmentation took 0.1346 seconds.
  over-segmented
  under-segmented
## 2800 ################################################################################
## 325682
  Real segmentation:    [[0, 1], [2]] (got at place -1)
  Predict segmentation: [[0], [1], [2]] (1.00000000)
  Segmentation took 0.0481 seconds.
  over-segmented
## 325689
  Real segmentation:    [[0, 1], [2, 3], [4], [5, 6]] (got at place -1)
  Predict segmentation: [[0, 1, 2, 3], [4], [5], [6]] (1.00000000)
  Segmentation took 0.1113 seconds.
  over-segmented
  under-segmented
## 325792
  Real segmentation:    [[0], [1, 2], [3], [4, 5], [6], [7], [8, 9], [10]] (got at place -1)
  Predict segmentation: [[0], [1], [2], [3], [4, 5], [6], [7, 10], [8, 9]] (1.00000000)
  Segmentation took 0.1735 seconds.
  over-segmented
  under-segmented
## 325915
  Real segmentation:    [[0, 1], [2]] (got at place -1)
  Predict segmentation: [[0], [1], [2]] (1.00000000)
  Segmentation took 0.0672 seconds.
  over-segmented
2015-06-07 19:39:39,710 INFO mean: 0.00
2015-06-07 19:39:39,711 INFO median: 0.00
2015-06-07 19:39:39,711 INFO TOP-1: 0.20
2015-06-07 19:39:39,711 INFO TOP-3: 0.20
2015-06-07 19:39:39,712 INFO TOP-10: 0.20
2015-06-07 19:39:39,712 INFO TOP-20: 0.20
2015-06-07 19:39:39,712 INFO TOP-50: 0.20
2015-06-07 19:39:39,712 INFO Over-segmented: 0.12
2015-06-07 19:39:39,712 INFO Under-segmented: 0.76
2015-06-07 19:39:39,712 INFO overUnder-segmented: 0.08
2015-06-07 19:39:39,712 INFO Out of order: 172
2015-06-07 19:39:39,713 INFO Total: 2838

Jun 07 '15 19:06 MartinThoma

Top 2 > 0.95 instead of top 3:

2015-06-08 20:55:56,451 INFO mean: 0.00
2015-06-08 20:55:56,465 INFO median: 0.00
2015-06-08 20:55:56,466 INFO TOP-1: 0.18
2015-06-08 20:55:56,466 INFO TOP-3: 0.18
2015-06-08 20:55:56,466 INFO TOP-10: 0.18
2015-06-08 20:55:56,466 INFO TOP-20: 0.18
2015-06-08 20:55:56,467 INFO TOP-50: 0.18
2015-06-08 20:55:56,467 INFO Over-segmented: 0.09
2015-06-08 20:55:56,467 INFO Under-segmented: 0.79
2015-06-08 20:55:56,467 INFO overUnder-segmented: 0.07
2015-06-08 20:55:56,467 INFO Out of order: 172
2015-06-08 20:55:56,468 INFO Total: 2838

Jun 08 '15 18:06 MartinThoma

Top 4 > 0.95 instead of top 3:

2015-06-08 21:01:28,281 INFO mean: 0.00
2015-06-08 21:01:28,282 INFO median: 0.00
2015-06-08 21:01:28,282 INFO TOP-1: 0.22
2015-06-08 21:01:28,282 INFO TOP-3: 0.22
2015-06-08 21:01:28,282 INFO TOP-10: 0.22
2015-06-08 21:01:28,282 INFO TOP-20: 0.22
2015-06-08 21:01:28,283 INFO TOP-50: 0.22
2015-06-08 21:01:28,283 INFO Over-segmented: 0.14
2015-06-08 21:01:28,283 INFO Under-segmented: 0.73
2015-06-08 21:01:28,283 INFO overUnder-segmented: 0.09
2015-06-08 21:01:28,283 INFO Out of order: 172
2015-06-08 21:01:28,283 INFO Total: 2838

Jun 08 '15 19:06 MartinThoma

Top 5 > 0.95 instead of top 3:

2015-06-08 21:07:36,789 INFO mean: 0.00
2015-06-08 21:07:36,790 INFO median: 0.00
2015-06-08 21:07:36,790 INFO TOP-1: 0.23
2015-06-08 21:07:36,791 INFO TOP-3: 0.23
2015-06-08 21:07:36,791 INFO TOP-10: 0.23
2015-06-08 21:07:36,791 INFO TOP-20: 0.23
2015-06-08 21:07:36,791 INFO TOP-50: 0.23
2015-06-08 21:07:36,791 INFO Over-segmented: 0.16
2015-06-08 21:07:36,791 INFO Under-segmented: 0.71
2015-06-08 21:07:36,791 INFO overUnder-segmented: 0.09
2015-06-08 21:07:36,792 INFO Out of order: 172
2015-06-08 21:07:36,792 INFO Total: 2838

Jun 08 '15 19:06 MartinThoma

Top 6 > 0.95 instead of top 3:

2015-06-08 21:14:14,411 INFO mean: 0.00
2015-06-08 21:14:14,412 INFO median: 0.00
2015-06-08 21:14:14,412 INFO TOP-1: 0.24
2015-06-08 21:14:14,412 INFO TOP-3: 0.24
2015-06-08 21:14:14,412 INFO TOP-10: 0.24
2015-06-08 21:14:14,413 INFO TOP-20: 0.24
2015-06-08 21:14:14,413 INFO TOP-50: 0.24
2015-06-08 21:14:14,413 INFO Over-segmented: 0.17
2015-06-08 21:14:14,413 INFO Under-segmented: 0.69
2015-06-08 21:14:14,413 INFO overUnder-segmented: 0.10
2015-06-08 21:14:14,413 INFO Out of order: 172
2015-06-08 21:14:14,413 INFO Total: 2838

Jun 08 '15 19:06 MartinThoma

Top 10 > 0.95 instead of top 3:

2015-06-08 21:19:49,930 INFO mean: 0.00
2015-06-08 21:19:49,931 INFO median: 0.00
2015-06-08 21:19:49,931 INFO TOP-1: 0.25
2015-06-08 21:19:49,931 INFO TOP-3: 0.25
2015-06-08 21:19:49,932 INFO TOP-10: 0.25
2015-06-08 21:19:49,932 INFO TOP-20: 0.25
2015-06-08 21:19:49,932 INFO TOP-50: 0.25
2015-06-08 21:19:49,932 INFO Over-segmented: 0.19
2015-06-08 21:19:49,932 INFO Under-segmented: 0.67
2015-06-08 21:19:49,932 INFO overUnder-segmented: 0.10
2015-06-08 21:19:49,932 INFO Out of order: 172
2015-06-08 21:19:49,933 INFO Total: 2838

Jun 08 '15 20:06 MartinThoma

Top 20 > 0.95 instead of top 3:

2015-06-08 22:33:33,301 INFO mean: 0.00
2015-06-08 22:33:33,302 INFO median: 0.00
2015-06-08 22:33:33,302 INFO TOP-1: 0.26
2015-06-08 22:33:33,302 INFO TOP-3: 0.26
2015-06-08 22:33:33,303 INFO TOP-10: 0.26
2015-06-08 22:33:33,303 INFO TOP-20: 0.26
2015-06-08 22:33:33,303 INFO TOP-50: 0.26
2015-06-08 22:33:33,303 INFO Over-segmented: 0.20
2015-06-08 22:33:33,303 INFO Under-segmented: 0.66
2015-06-08 22:33:33,303 INFO overUnder-segmented: 0.11
2015-06-08 22:33:33,303 INFO Out of order: 172
2015-06-08 22:33:33,304 INFO Total: 2838
./segmentation.py  284,03s user 3,10s system 96% cpu 4:56,26 total

Jun 08 '15 20:06 MartinThoma

All instead of cropping at top 3:

2015-06-08 22:46:18,547 INFO mean: 0.00
2015-06-08 22:46:18,548 INFO median: 0.00
2015-06-08 22:46:18,549 INFO TOP-1: 0.27
2015-06-08 22:46:18,549 INFO TOP-3: 0.27
2015-06-08 22:46:18,549 INFO TOP-10: 0.27
2015-06-08 22:46:18,549 INFO TOP-20: 0.27
2015-06-08 22:46:18,550 INFO TOP-50: 0.27
2015-06-08 22:46:18,550 INFO Over-segmented: 0.20
2015-06-08 22:46:18,550 INFO Under-segmented: 0.65
2015-06-08 22:46:18,550 INFO overUnder-segmented: 0.11
2015-06-08 22:46:18,550 INFO Out of order: 172
2015-06-08 22:46:18,552 INFO Total: 2838
./segmentation.py  299,81s user 3,90s system 95% cpu 5:18,39 total

Jun 08 '15 20:06 MartinThoma

top 1 > 0.95 instead of top 3:

2015-06-08 23:04:08,941 INFO mean: 879844.96
2015-06-08 23:04:08,942 INFO median: 1000000.00
2015-06-08 23:04:08,942 INFO TOP-1: 0.12
2015-06-08 23:04:08,943 INFO TOP-3: 0.12
2015-06-08 23:04:08,943 INFO TOP-10: 0.12
2015-06-08 23:04:08,943 INFO TOP-20: 0.12
2015-06-08 23:04:08,943 INFO TOP-50: 0.12
2015-06-08 23:04:08,944 INFO Over-segmented: 0.01
2015-06-08 23:04:08,944 INFO Under-segmented: 0.88
2015-06-08 23:04:08,944 INFO overUnder-segmented: 0.01
2015-06-08 23:04:08,944 INFO Out of order: 172
2015-06-08 23:04:08,944 INFO Total: 2838

Jun 08 '15 21:06 MartinThoma

All

2015-06-08 23:12:56,249 INFO mean: 734672.30
2015-06-08 23:12:56,250 INFO median: 1000000.00
2015-06-08 23:12:56,250 INFO TOP-1: 0.27
2015-06-08 23:12:56,250 INFO TOP-3: 0.27
2015-06-08 23:12:56,251 INFO TOP-10: 0.27
2015-06-08 23:12:56,251 INFO TOP-20: 0.27
2015-06-08 23:12:56,251 INFO TOP-50: 0.27
2015-06-08 23:12:56,251 INFO Over-segmented: 0.20
2015-06-08 23:12:56,251 INFO Under-segmented: 0.65
2015-06-08 23:12:56,251 INFO overUnder-segmented: 0.11
2015-06-08 23:12:56,252 INFO Out of order: 172
2015-06-08 23:12:56,252 INFO Total: 2838

Jun 08 '15 21:06 MartinThoma

What about using a varying size bounding box ? Physically & historically, each symbol are separated because else it would become unreadable. I think it's a good idea to compute a bounding box while a symbol is being drawn whose size is adjusted by the current stroke length. When the pen is lifted, one has to assume to either "start a new symbol" or "concatenate existing symbol". This decision can be based on a bounding box whose shape can be trained, but with a size that's dynamic. For example, when drawing the "integral" symbol, the bounding box is an horizontally thin and vertically tall rectangle that would be recognized by the network, but the size itself is extracted from the current symbol's size. When a new stoke is produced, and it intersects a previous stroke's bounding box, you can likely concatenate it with the previous symbol.

Jan 30 '18 20:01 X-Ryl669

hwrt hwrt copied to clipboard

Segmentation

Features

Problems

153892

Analyze data

Results

Segment

Single Stroke combinational recognizer

Stream segmentation

hwrt
hwrt copied to clipboard