delayed_job icon indicating copy to clipboard operation
delayed_job copied to clipboard

Worker crashes when error.message is too long

Open jan opened this issue 11 years ago • 7 comments

I had an eager load statement that caused a very long SQL statement. It was longer than 16M, the default max_pakcet_size of MySQL, thus the server rejected it with an error. When delayed_job tried to save the error message

job.last_error = "#{error.message}\n#{error.backtrace.join("\n")}"

This error.message also contains the SQL statement, so the server again rejects this. The worker then shuts down. And then all other workers on all our servers will follow, as delayed_jobs.attempts and run_at are not increased.

As delayed_jobs.last_error - in the MySQL implementation - is a TEXTand can only hold 64K anyway, I would suggest to simply limit the number of characters that are stored:

job.last_error = "#{error.message[0 ... 50 * 1024]}\n#{error.backtrace.join("\n")}"

With 50K there is still room for the stack trace and I doubt that after 50K of error message there is anything new that would be missed.

The above line would have to be patched in Delayed::Worker.run and handle_failed_job.

jan avatar Jul 23 '13 08:07 jan

We ran into this issue as well. Noticed what the cause of the error was today.

The job fails and then DJ tries to save the entire error message which includes a long SQL query. This results in an error with MySQL Mysql2::Error: Data too long for column 'last_error' The worker then dies. Another worker comes along and picks up this job, and dies in the same way. This happens until each worker is dead. :-(

Would be nice for DJ to truncate the last_error message to keep from failing in this way, as @jan suggested.

kyletolle avatar Sep 13 '16 21:09 kyletolle

Unfortunately there is no reasonable way for us to determine what the value needs to be truncated to. Different servers have different limits, and those limits are also configurable. The limits get even weirder in mysql if the table is configured as utf8-mb4. All that to say, we tried, and those weren't the only problems.

albus522 avatar Sep 13 '16 21:09 albus522

Thanks for the update @albus522. Glad to hear you've considered this already. Would you have a workaround to suggest?

Something like wrapping the code in our job with a catch block and then re-raising the exception after truncating the error message to what we know is a reasonable value?

kyletolle avatar Sep 13 '16 21:09 kyletolle

This is not elegant, but placing this monkey patch in an Rails initializer sidesteps the limitation of the last_error column:

module Delayed
  module Backend
    module Base
      def last_error=(err_message)
        col_limit = self.column_for_attribute(:last_error).limit || 65535
        write_attribute :last_error, err_message.truncate(col_limit, omission: " ... (error message too long)")
      end
    end 
  end 
end 

Is column_for_attribute(:last_error).limit reliable for different character sets? Wonkiness in this arena must be what @albus522 was alluding to.

The other obvious workaround that comes to mind is to change the column type of the last_error column to LONGTEXT or similar to take more room. Admittedly, this potentially kicks the issue down the road.

mccolin avatar Oct 14 '16 03:10 mccolin

Thanks @mccolin for the reply. The approach we took was to change the column type to be able to store longer strings. I think we used MEDIUMTEXT since 16MB is still quite long. It's potentially kicking the issue down the road, but was a relatively easy way to alleviate the nasty kills-all-workers issue we were running into.

kyletolle avatar Feb 13 '17 23:02 kyletolle

I ran into this issue today on the (somewhat legacy app) I work on. The issue was that the mysql database wasn't configured for utf8mb4 strings, and the exception that caused the job to fail was because of an invalid encoding Mysql2::Error: Incorrect string value.... error. Unfortunately, the message which was then going to be put into the last_error column also caused the same Mysql2::Error encoding error and the job went into an infinite loop. Fortunately it was caught in testing, but, would have been pretty bad if it had made it into production.

It may be a halting problem, but, would it be possible to detect these failures and rescue them somehow? (or at minimum escalate them). The real issue here for me was that the job was spinning exceptions weren't thrown in a way that that could be seen.

I ended up fixing it by stripping out the 4 byte chars in a monkey patch like in https://github.com/collectiveidea/delayed_job/issues/550#issuecomment-253703623 , but yes, not the nicest.

Ok, this is already captured in https://github.com/collectiveidea/delayed_job/issues/774

patrickdavey avatar Sep 24 '20 07:09 patrickdavey

Here's my solution using the active record gem and before_save callback:

  • config/initializers/monkey_patches/delayed_job.rb


# This is a monkey patch to limit the length of the last_error column in the delayed_jobs table
# to prevent this error: FATAL Mysql2::Error: Data too long for column 'last_error' at row 1
# https://github.com/collectiveidea/delayed_job/issues/550
module Delayed
  module Backend
    module ActiveRecord
      class Job
        before_save :truncate_last_error, if: :last_error?

        def truncate_last_error
          if last_error.present?
            col_limit = self.column_for_attribute(:last_error).limit || 65535
            self.last_error = last_error.truncate(col_limit, omission: ' ... (error message too long)')
          end
        end
      end
    end
  end
end

edusantana avatar Aug 01 '24 19:08 edusantana