generalized-language-modeling-toolkit icon indicating copy to clipboard operation
generalized-language-modeling-toolkit copied to clipboard

indexing the ngrams

Open renepickhardt opened this issue 11 years ago • 3 comments

it might be interesting already in this toolkit to index the ngrams using FSTs or trieBased solutions. This is something that we should discuss since this seems like a rather big step but it would increase the performance, reduce the storage needs and also make it easier to create applications out of the box since fast querying will be possible.

renepickhardt avatar Mar 08 '14 09:03 renepickhardt

If I understand this correctly, these are just performance optimizations so we are doing neither at the time and in the future have to choose a dataformat if we want to optimize?

lschmelzeisen avatar Apr 26 '14 11:04 lschmelzeisen

right we don't do that yet but I want to leave the issue open as this is an issue (enhancement)

renepickhardt avatar Apr 26 '14 11:04 renepickhardt

Potential bachelor thesis of mine. How to index and compress (skipped) ngrams.

lschmelzeisen avatar Jun 11 '14 11:06 lschmelzeisen