decreasing disc i/o
currently the buffered writeres have a fixed buffersize. utelizing build in methods from the java runtime we can adaptive decide which how big the buffer size should be. Also if the inputed data is not too large we might not need disc hits at all.
This obviously has a connection to the question if we should exchange the underlying storage to fst's or a trie based solution.
so please let us discuss this first
as we saw yesterday: disc i/o is not the limiting factor.
we should still calculate the size of buffers and the maxcount divider dynamically in accordance with the memory size which is set in config.txt
I had dynamically calculated buffer sizes in the programm for some time now. They did not work really good for my fixed max memory, and I will expect them to not work at all if user changes max memory. So for now memory limits are specified in config.
I don't think disc i/o is the limiting factor in most parts of the application. I have rewrote some parts of it, to use rather heavy disc i/o because of that, and the resulting code was faster and cleaner, even for large corpora. I haven't done any profiling yet.
I think we might still have dynamic memory calculation in the future, just not for the stable release.
fine for me currently
On Tue, Jan 6, 2015 at 3:38 PM, Lukas Schmelzeisen <[email protected]
wrote:
I had dynamically calculated buffer sizes in the programm for some time now. They did not work really good for my fixed max memory, and I will expect them to not work at all if user changes max memory. So for now memory limits are specified in config.
I don't think disc i/o is the limiting factor in most parts of the application. I have rewrote some parts of it, to use rather heavy disc i/o because of that, and the resulting code was faster and cleaner, even for large corpora. I haven't done any profiling yet.
I think we might still have dynamic memory calculation in the future, just not for the stable release.
— Reply to this email directly or view it on GitHub https://github.com/renepickhardt/generalized-language-modeling-toolkit/issues/10#issuecomment-68872582 .
www.rene-pickhardt.de http://www.beijing-china-blog.com/
Skype: rene.pickhardt
mobile: +49 (0)157 / 3730 2422 office: +49 (0) 261 / 287 2765 fax: +49 (0) 267 / 287 100 2765
china mobile: +86 186 2129 5033