Mišo Belica

Results 74 comments of Mišo Belica

There is now build for Windows available at the [release page](https://github.com/AGWA/git-crypt/releases) starting from v0.7.0 so there is no need for custom builds and downloading potentially infected EXE files from over...

It's really bad if it get stuck but the page https://omnieq.com/ does not seem like very content rich to me. There almost no text. I seems only table with some...

It seems it's not an infinite loop, but just very suboptimal code. There are cca 50k small paragraphs found in the page and the function to revise them is basically...

I fixed some issues in the main branch, but now if I run `python -m justext -s Polish "https://wiadomosci.gazeta.pl/wiadomosci/7,114883,27025667,ziemniaki-na-szostej-surowka-na-dziesiatej-jak-pomoc-zeby.html"` I think it gets you what you expect. The title _"Ziemniaki...

You are right. Do you know any real website using XML serialization of the HTML?

OK, thank you. I guess it's not that hard to add. Maybe I am wrong but I don't think there are plenty of XHTML documents left out there. We will...

@polosatyi Hi, to be honest I don't know where is the problem. JustText has many heuristics and it may be be any of them or the combination. I can see...

This is quite a common problem with the Windows console. Are you sure you have OS, console, IO, and everything set to UTF-8 or any other suitable encoding? (UTF-16, ...)...

Yes, the name **stop words** is not the best one. The words are actually the most frequent words. Maybe I'll rename the list into **frequent words** or sth like that...

Sorry, but I didn't created the lists of the words so I can't tell you from what corpus was the german most frequent words created, but I can imagine some...