cbrain icon indicating copy to clipboard operation
cbrain copied to clipboard

Bourreau Workers should prefetch file revision info

Open prioux opened this issue 2 years ago • 2 comments

The CbrainFileRevision class caches GIT file revision info for the CBRAIN codebase the first time it's accessed.

This works well as long as the request for revision info, over time, happen within the same process.

On the Bourreau side however we spawn workers and subworkers, and when they fetch revision info, the info is not cached in the main Bourreau.

This means that whenever a subworker processes a task, say BoutiquesTask::SomeSuperTool and logs the GIT information in the task's log, the cached information is not available the next time around the subworker is spawned.

So I suggest the BourreauWorker, on its initial setting up phase, per-fetches the GIT revision info for all the possible task classes that it is configured to handle.

Something like:

ToolConfig
  .where(:bourreau_id => current_bourreau_id)
  .map(&:tool)
  .uniq
  .map(&:cbrain_task_class)
  .each { |klass| klass.revision_info.self_update }

That way, when the Worker spawns its subworkers, they will already have all the revision info code in memory, and won't have to issue any redundant git commands.

prioux avatar Jul 10 '22 21:07 prioux

@prioux hi, is there some easy way to test that change corrects issue or improve speed?

So far I just noted issues with old/deleted boutiques, put purging config/tools/task from db helped

MontrealSergiy avatar Aug 16 '22 19:08 MontrealSergiy

My suggestion:

in file BrainPortal/lib/cbrain_file_revision.rb, in method get_git_rev_info_from_git , add this at the beginning:

File.open("/tmp/revlog","a") do |fh|
  fh.write(
    sprintf( "[%s] git %f by %s\n",
      Time.now.to_s,
      @fullpath,
      $0
    )
  )
end

Now every time a git command is actually run to get the revision info of ANY file, one line will be added to the logfile /tmp/revlog .

To test:

  1. make sure your bourreau is off
  2. make sure the logfile /tmp/revlog doesn't exist
  3. start the bourreau and workers with the code BEFORE the fix
  4. launch a test task (anything it doesn't matter)
  5. wait until the task is finished
  6. launch a second test task of the same type
  7. wait until the task is finished again
  8. save a copy of /tmp/revlog somewhere
  9. repeat 1-8 but with your fixed code at step 3

Compare the two log files. you should clearly see that the workers do not fetch the revision info for the task's type, if the fixes are applied properly.

prioux avatar Aug 16 '22 20:08 prioux