async icon indicating copy to clipboard operation
async copied to clipboard

Memory leak when using Task#with_timeout

Open jpaulgs opened this issue 3 years ago • 9 comments

#!/usr/bin/env ruby

require 'async'
require 'async/queue'

@input_channel = Async::LimitedQueue.new(100)

def consumer
  Async do |task|
    loop do
      task.with_timeout(0.001) do
        @input_channel.dequeue
      rescue Async::TimeoutError
        # no-op
      end
    end
  end
end

task = consumer

task.wait

Hey, I'm running ruby 3.1.2 and version 2.0.3 of the Async rubygem. The code above is similar to what we are using. It exhibits a memory leak. The following photo was from running this for about 15 minutes, it certainly never releases the memory.

image

I'll keep digging to see what I can find out.

jpaulgs avatar Sep 01 '22 23:09 jpaulgs

Actually the LimitedQueue is irrelevant here, the following code also shows the issue

#!/usr/bin/env ruby

require 'async'

def consumer
  Async do |task|
    loop do
      task.with_timeout(0.001) do
        # no-op
      rescue Async::TimeoutError
        # no-op
      end
    end
  end
end

task = consumer

task.wait

image

jpaulgs avatar Sep 01 '22 23:09 jpaulgs

Hmm, thanks for reporting this, I'll take a look.

ioquatix avatar Sep 01 '22 23:09 ioquatix

#!/usr/bin/env ruby

require 'async'
require 'memory'

def consumer
	Async do |task|
		report = Memory.report do
			1000000.times do
				task.with_timeout(0.001) do
					# no-op
				rescue Async::TimeoutError
					# no-op
				end
			end
		end
		report.print
		binding.irb
	end
end

task = consumer

task.wait

Memory Profile

  • Total Allocated: (28.00 MiB in 4000000 allocations)
  • Total Retained: (7.00 MiB in 1000000 allocations)

By Gem (288.00 MiB in 4000000 allocations)

  • (288.00 MiB in 4000000 allocations) timers-4.3.3

By File (288.00 MiB in 4000000 allocations)

  • (160.00 MiB in 2000000 allocations) /home/samuel/.gem/ruby/3.1.2/gems/timers-4.3.3/lib/timers/timer.rb
  • (88.00 MiB in 1000000 allocations) /home/samuel/.gem/ruby/3.1.2/gems/timers-4.3.3/lib/timers/group.rb
  • (40.00 MiB in 1000000 allocations) /home/samuel/.gem/ruby/3.1.2/gems/timers-4.3.3/lib/timers/events.rb

By Location (288.00 MiB in 4000000 allocations)

  • (160.00 MiB in 2000000 allocations) /home/samuel/.gem/ruby/3.1.2/gems/timers-4.3.3/lib/timers/timer.rb:38
  • (88.00 MiB in 1000000 allocations) /home/samuel/.gem/ruby/3.1.2/gems/timers-4.3.3/lib/timers/group.rb:60
  • (40.00 MiB in 1000000 allocations) /home/samuel/.gem/ruby/3.1.2/gems/timers-4.3.3/lib/timers/events.rb:72

By Class (288.00 MiB in 4000000 allocations)

  • (160.00 MiB in 2000000 allocations) Proc
  • (88.00 MiB in 1000000 allocations) Timers::Timer
  • (40.00 MiB in 1000000 allocations) Timers::Events::Handle

ioquatix avatar Sep 02 '22 01:09 ioquatix

I see what is going on.

https://github.com/socketry/timers/blob/64a745a8b2f28849786c8c683ed2a4ef2eb0ae47/lib/timers/events.rb#L71

In order to avoid scheduling timeouts which is expensive, we put them into a queue. Most timeouts are short lived and don't fire because the operation happens immediately.

e.g.

task.with_timeout(0.1) do
  io.read
end

most of the time, io.read returns immediately, so we want the cost of the timeout to be minimal. So we defer as much work as possible, because it also makes cleaning up expensive.

Basically, with_timeout is a no-op unless some operation actually blocks.

There is a @queue which stores these as yet processed timers, and they get processed only on the next call to wait.

https://github.com/socketry/timers/blob/64a745a8b2f28849786c8c683ed2a4ef2eb0ae47/lib/timers/events.rb#L107

However, if you have a tight loop, this queue can grow indefinitely.

I think the solution is just to flush the queue after it gets to a certain size, probably 100 or 1000. The goal is to avoid adding a huge overhead, while also minimising the cost of inserting timers.

ioquatix avatar Sep 02 '22 01:09 ioquatix

I was starting to focus on @timers = Set.new in timers/group.rb

#!/usr/bin/env ruby

require 'async'

def consumer
  Async do |task|
    loop do
      task.with_timeout(0.001) do |timer|
        timer.cancel
        print '.'
      rescue Async::TimeoutError

      end
    end
  end
end

task = consumer

task.wait

Which does seem to stabilise the memory usage but that may have more todo with the IO (which also seems to be required)

jpaulgs avatar Sep 02 '22 01:09 jpaulgs

Any time you context switch e.g. on IO, the timers queue is flushed. We just need to flush it more often.

ioquatix avatar Sep 02 '22 02:09 ioquatix

Thanks for the speedy assistance, @ioquatix! :heart:

Should we be adding the timer.cancel line to our code, or are you working on / thinking about doing that internally? I see you've just put up a branch of timers here, which looks to be addressing the flushing aspect (cheers!), but not the cancel (yet)?

(I've been debugging our memory leak with @jpaulgs and @oeoeaio)

ZimbiX avatar Sep 02 '22 05:09 ZimbiX

I'm going to release an update to fix the issue in the timers gem.

ioquatix avatar Sep 02 '22 08:09 ioquatix

https://github.com/socketry/timers/commit/1ec78fbe59c934925e5f4a32fbd5a380c8120328 is the initial attempt at a fix, without introducing too much overhead.

To be honest, using a linked list here would make way more sense. My plan has always been to replace the timers gem with C implementation in io-event gem, I just haven't got around to it yet.

This should avoid the worst case, but it won't avoid the case where you create lots of timers and interleave cancelled and non-cancelled timers. We could do more work in flush!... doing a lot of scans would mitigate a lot of the benefit.

ioquatix avatar Sep 02 '22 09:09 ioquatix

We have tested the changes to the timers gem and while they do appear to remove the accumulation of Timers::Timer, there is still a memory leak in our code (which uses a LimitedQueue as per the OP). Investigating further, we identified this line as the culprit. It appears that when used in conjunction with a Task.with_timeout, instances of Queue (and LimitedQueue) will accumulate fibers in their @waiting list in the absence of any other fiber signalling the Queue, leading to a memory leak.

We've made an attempt to fix this issue in #186

oeoeaio avatar Oct 27 '22 00:10 oeoeaio

Thanks for this update.

ioquatix avatar Oct 27 '22 19:10 ioquatix

The latest release should address the memory leaks.

ioquatix avatar Jan 06 '23 11:01 ioquatix

I am going to close this issue. Based on the reproduction scripts, the issue no longer exists.

ioquatix avatar Jun 07 '23 11:06 ioquatix