mdakin

Results 13 comments of mdakin

maybe for now only providing this would be enough? `List paragraphs = split document from line breaks;`

So the usage will be like: ``` extractor = ...; for (String paragraph: extractor.splitFromLineBreaks(doc)) {` extractor.extract(paragraph); } ``` or directly ``` extractor.extract(extractor.splitFromLineBreaks(doc)) ``` So I am not sure now, maybe...

There is a possibility of using objects like Document, Paragraph etc, but that is another issue. So your initial suggestion is fine I think, maybe having separate method names like...

0.12 still creates double results. did not test with 0.13 Input: yoksa yoksa [yoksa:Conj] yoksa:Conj [yoksamak:Verb] yoksa:Verb+Imp+A2sg [yok:Adj] yok:Adj|Zero→Verb+sa:Cond+A3sg [yok:Adj] yok:Adj|Zero→Verb+sa:Cond+A3sg [Yok:Noun,Prop] yok:Noun+A3sg|Zero→Verb+sa:Cond+A3sg [yok:Noun] yok:Noun+A3sg|Zero→Verb+sa:Cond+A3sg Disambiguation result: [yoksa:Conj] yoksa:Conj

deasciifier ciktisini morfolojik olarak analiz edip eger hataliysa iptal etmek makul bir yaklasim mi?

Genel olarak daha iyi sonuc verecegini dusunuyorum. Hatali yazilmis kelimeyi hatali yazmis baska bir kelimeye donusturmusse zaten buyuk bir kayip olmaz, sadece onceki haline geri donmus olur.

@ilkerhk Afaik, normalization depends on a language model which could use quite a bit of memory. @ahmetaa can you confirm? Also, normalization and similar tasks may have higher initialization time...

@ahmetaa The language model memory usage is probably larger than 100MB (model file is 80MB, I am assuming it does not map 1:1 in memory. From your graphs just normalization...

Maybe we can move your work into a contrib directory, would that work for you? WDYT ahmetaa@ ?

@kaansonmezoz Hata olusturan giris tam olarak nedir? bu hatayi tetikleyen kod orneginizi verirseniz hatayi tekrarlayabiliriz ve cozumu bulmak daha kolay olur.