write-math icon indicating copy to clipboard operation
write-math copied to clipboard

Define which symbols get added to the classifier

Open MartinThoma opened this issue 9 years ago • 0 comments

Currently, write-math.com is used as a single symbol classifier. The back end is able to classify 377 symbols by now (2015-06-05).

It would be beneficial for the project to define which symbols get added. The main issue is the number of training examples. It should probably also be avoided to add overly complex symbols (e.g. http://write-math.com/tags/image) or variants (e.g. http://write-math.com/tags/big-variant or http://write-math.com/tags/upgreek). I would say each symbol which has at least 100 examples is a candidate to be added to the good MLP classifier.

The training set of new symbols needs to be checked manually. But it can be assisted by a human. There should be a clusterer which is able to find similar symbols for arbitrary, new symbols and detect outliers.

It would also be nice to have a second classifier which is able to work with 10 - 100 examples. The good MLP classifier could get a rejection class. If a symbol is rejected, the slower, less accurate classifier gets used.

MartinThoma avatar Jun 05 '15 17:06 MartinThoma