Possible Memory Leak
There is a possible memory leak with Readability. I have a process with 40 threads, which calls a method which runs Readability on HTML documents. The method is rather simple:
def stripTags(source) content = Readability::Document.new(source).content content = strip_tags(source).gsub("\n", " ").squeeze(" ").strip //strip_tags is the Rails helper method for stripping tags return content end
I saw my memory usage increase gradually, from < 5% all the way to 80% after a day or so. What I did was try to narrow down the cause, so I commented out the Readability logic/calls, and that resolved the issue: no memory leaks. As soon as I put back the Readability call, the memory leak started again.
To temporarily fix this, I simply monitored my process with God, and had it restart if memory usage got too high, but I'm fairly certain there's a memory leak with the Ruby port of Readability.
Interesting. Thank you for reporting this. I wonder if the issue is in Ruby-Readability itself, or in Nokogiri.
@HenleyChiu just curious, do you still see that issue today? Which Ruby version are you using? Thanks!