generalized-language-modeling-toolkit icon indicating copy to clipboard operation
generalized-language-modeling-toolkit copied to clipboard

How to treat reserved symbols in Training and Querying files

Open lschmelzeisen opened this issue 11 years ago • 1 comments

Currently reserved symbols are _ (absolute skip), % (continuation skip) / (token-pos-separator).

IIRC the program fails if any of these are contained in training or querying files.

How do we cope with this isse?

lschmelzeisen avatar Jan 06 '15 18:01 lschmelzeisen

Commit 9e4c6a7e740eaa55183431a5748fe31e445054b4 scans corpus for reserved symbols and refuses execution if it contains any.

However I'n the long run I would like to have some form of escaping the input to make it transparent for the user.

lschmelzeisen avatar Jan 12 '15 11:01 lschmelzeisen