mdakin comments

Results 13 comments of


                                            mdakin

Consider adding extractFromDocument() to TurkishSentenceExtractor

maybe for now only providing this would be enough? `List paragraphs = split document from line breaks;`

Consider adding extractFromDocument() to TurkishSentenceExtractor

So the usage will be like: ``` extractor = ...; for (String paragraph: extractor.splitFromLineBreaks(doc)) {` extractor.extract(paragraph); } ``` or directly ``` extractor.extract(extractor.splitFromLineBreaks(doc)) ``` So I am not sure now, maybe...

Consider adding extractFromDocument() to TurkishSentenceExtractor

There is a possibility of using objects like Document, Paragraph etc, but that is another issue. So your initial suggestion is fine I think, maybe having separate method names like...

Zemberek creates duplicate WordAnalysis results

0.12 still creates double results. did not test with 0.13 Input: yoksa yoksa [yoksa:Conj] yoksa:Conj [yoksamak:Verb] yoksa:Verb+Imp+A2sg [yok:Adj] yok:Adj|Zero→Verb+sa:Cond+A3sg [yok:Adj] yok:Adj|Zero→Verb+sa:Cond+A3sg [Yok:Noun,Prop] yok:Noun+A3sg|Zero→Verb+sa:Cond+A3sg [yok:Noun] yok:Noun+A3sg|Zero→Verb+sa:Cond+A3sg Disambiguation result: [yoksa:Conj] yoksa:Conj

Normalizer'daki farklılık (0.16.0 'dan 0.17.1'e geçiş testlerinde)

deasciifier ciktisini morfolojik olarak analiz edip eger hataliysa iptal etmek makul bir yaklasim mi?

Normalizer'daki farklılık (0.16.0 'dan 0.17.1'e geçiş testlerinde)

Genel olarak daha iyi sonuc verecegini dusunuyorum. Hatali yazilmis kelimeyi hatali yazmis baska bir kelimeye donusturmusse zaten buyuk bir kayip olmaz, sadece onceki haline geri donmus olur.

High memory usage question

@ilkerhk Afaik, normalization depends on a language model which could use quite a bit of memory. @ahmetaa can you confirm? Also, normalization and similar tasks may have higher initialization time...

High memory usage question

@ahmetaa The language model memory usage is probably larger than 100MB (model file is 80MB, I am assuming it does not map 1:1 in memory. From your graphs just normalization...

graph.json as a pull request

Maybe we can move your work into a contrib directory, would that work for you? WDYT ahmetaa@ ?

TurkishSentenceNormalizer combineNecessaryWords() java.lang.ArrayIndexOutOfBoundsException: -1

@kaansonmezoz Hata olusturan giris tam olarak nedir? bu hatayi tetikleyen kod orneginizi verirseniz hatayi tekrarlayabiliriz ve cozumu bulmak daha kolay olur.