annodoc icon indicating copy to clipboard operation
annodoc copied to clipboard

Tokenize SD input?

Open spyysalo opened this issue 10 years ago • 0 comments

The SD parser currently only separates tokens by whitespace, so that e.g. the last token of

~~~ sdparse
foo bar.
dep(foo, bar)
~~~

is bar., making the above break as the system can't find the token bar (without terminal dot). This appears to be a common source of error in manually entered SD analyses.

The possibility of doing e.g. PTB-like tokenization of input should be at least considered.

spyysalo avatar Oct 02 '14 06:10 spyysalo