html-differ
html-differ copied to clipboard
memory leaks
when i try it with > 100kb files, it almost always gets to this message:
- FATAL ERROR: JS Allocation failed - process out of memory
it also runs long time, before showing this error (few minutes). Would it be possible to process diff with multiple child processes. Every child would process part of html, use less memory and we would utilize multiple CPUs (since all servers nowadays have few processors).
Or some other memory handling, to prevent leaks? Otherwise, great code, very usefull!
Actually, I've tried it with > 100kb files and it works.
But I suppose that your problem concerns the recursive algorithm which html differ
uses.
The recursion depth is too big, so the tool fails.
In the version 1.0.0
I do not use any recursion, so I think that html-differ
will work much faster and do not break at your case.
It would be great if you give me an example of a file which now fails.
Or you can even help me and test the prototype of version 1.0.0
)
Checkout the branch support/1.0.0
(be careful! I've broken the API) and test it on your files.
I tried to install version 1.0.0 but it always says its running 0.5 when you check with -v (version)
Also it runs long and then dies. What is the procedure to install version 1.0.0. correctly? I downloaded branch support/1.0.0 and then went to that folder and executed "npm install ."
I can send you files, can you give me your email, or something to comunicate? Best,
Yep, version 1.0.0
is in developing mode.
You should clone the branch support/1.0.0
from github, go to the cloned folder and run
npm install
bin/html-differ --help
my email:
[email protected]
Hmm strange, it still doesnt work. And you email doesnt work either, it says no such user. best,
I sended. I think it gots problems with bigger htmls because like you said recursive algorithm. But with 1.0.0 it doesnt seem to be any diffrent. Maybe we could split html to few slices, and process them separatly.
Thank you! I'll figure it out as soon as possible!
OK, thanks.
If I can help in anyway, please tell me. Best regards,
I've understood why this happens.
It is so many diffs in the given HTML
s, so the work of the module jsdiff (I use it as dependence in html-differ
) becomes really slow! Yes! It works, but the process of the comparison is so long that the Operating System thinks that the program fails and stop it in order not to break the work of OS in general.
I've opened an issue in jsdiff == > issue34
P.S. If you run html-differ
on files > 100kb and they will have not such many diffs - the tool will not fail. (even if you use version 0.5.0
)
Hmmm,
The problem is that a lot of smaller files (>40+50 kb) also failes. If jsdiff adds maximum loop, we didnt solve the problem, because we dont show all diffs. The files I sended are real life webpages, so they should make problems.
Maybe the solution would be to split html to chunks and parallel process this chunks. Just an idea. Im more into php, so dont have very good knowledge about node.js.
Best regards,
The problem is not in sizes of HTML
s, but in the amount of diffs between them.
How can you split the HTML? What criteria to use? If you split it , you can break it and the comparison will be not fair.
The idea of the differ was to compare not absolutely different HTML
s.
Is it necessary to compare HTML
s if they are absolutely different and you can see the diffs without any tools?
As the variant. jsdiff
adds maximum loop.
You see not all diffs, fix them or do something else what you want and run html-differ
again.
I've found the solution! Seems, it is the best!
In current implementation of html-differ
the module jsdiff
compares texts by words (there are several mods: charDiff
, wordDiff
, lineDiff
...). If we call jsdiff
in mode lineDiff
, the amount of diffs becomes less, so jsDiff
does not fails.
Solution:
This mode of jsdiff
will be configurable in html-differ
through an option, so you can choose how to compare HTML
s (by chars, byWords, byLines).
Does this variant suit you?
Hey,
Great :+1: Is the result still the same if you compare line by line instead of word by word? Because of classes, and other things you meantioned in README?
How can I apply this patch, what file should I change to enable this, so I can test some more files? :)
I am going to do this in version 1.0.0
. There are some moments concerning this option which should be handled.
If I will compare texts by lines, the option ignoreWhitespaces
will work in another way - it will ignore all whitespaces except \n
s. If more than two \n
s go successive, they will turn into one \n
.
For example, if you set the option which compares files by lines, ignoreWhitespaces
will work in the following way:
<span>
</span>
turns to
<span>
</span>
Does this variant suit you?
Yep, it should do it, as long as both are changed same way. Commit changes asap, and I will test with few diffrent htmls.
I had another idea, what if we first check line by line, and then if lines are the same go on, if not we also check word by word that line. Maybe then you get best from both worlds - speed and great diff tool. Just an idea, dont know if its possible to make it.
Hmmm...
Not bad, not bad...
I'll think about it and fix it ASAP!
Great,
Thanks for fast replay. I will prepare diffrent (sizes, tags, etc...) html examples files and try and debug.
Best regards,
I will do the implementation of a new option (my solution), it should solve your problem too.
And after that in minor version I 'll think about the enhancement that you have proposed, because I also need ASAP something working ) and after release of 1.0.0
, I'will think about speed and quality enhancements.
Created an issue concerning your idea == > change the comparison algorithm
Yeah, I agree with you :+1: :+1:
I can not understand, why do you need differ in tasks when you can find diffs without any tools?
I tried to implement the idea of comparison by lines, but there are still so many diffs that jsdiff
fails.
Also see the comments about you ideas concerning the new algoritm of comparison ==> #95
Because I really like your tool, and it was working great. I dont understand, why you dont want it to work on bigger files too. Can you just add option to choose between lines, words as option? I tried to do it myself but could find what function is calling diff.js. Thanks,
It is not a problem to compare texts by lines!
I've implemented this option, but it will not help you.
Because as I've already said you that even in this mode! there are so many diffs that jsdiff
fails.
When we compare by words, we receive more than 50000 diffs - it is too big for jsdiff
.
And even when we compare be line, we recieve more than 6000 diffs - it is very big result for jsdiff too.
Hmm, I see. In my tests no problem to 2000 - 3000, then it stops. Could we optimise something else?
It seems No! It is absolutely the task of jsdiff
(