faker Performance: Cache translations to reduce file read and parse operations

Performance: Cache translations to reduce file read and parse operations

Open alextaujenis opened this issue 10 months ago • 3 comments

Motivation / Background

This Pull Request has been created because the Faker library is reading from disk and parsing the yml translation files on nearly every method call. These redundant "file read and YAML parse" operations can be efficiently cached in-memory to increase the performance of the entire library by 10.6% (faster), full modules are up to 18% (faster), and individual methods are up to 30% (faster).

Additional Information

The translate cache in this PR is stored in a class variable so it can be shared during the program operation.
The lookup key is a combination of the args string with a deterministically serialized (sorted) opts hash. (This allows the cache to generate the correct lookup key regardless of the order of the provided opts keys.)
The call to I18n.translate is then either retrieved from memory, or cached for future lookup and returned.

@@translate_cache = {}

def translate(*args, **opts)
  translate_key = args.to_s + opts.sort.join
  @@translate_cache[translate_key] ||= I18n.translate(*args, **opts)
end

This cache speeds up the operation of the entire Faker library by 10.6%, but it comes with the slight tradeoff of increasing memory size as the cache is warmed (as you use Faker methods within your program). Fortunately enough; the ENTIRE Faker English translation directory is only 2.2MB, while all Faker translations combined are 7.1MB. Allocating anywhere from 2MB - 7MB of extra memory for the Faker library to run 10.6% faster is a great tradeoff today with system memory typically measured in thousands of megabytes.

Performance Benchmark

You can see from the benchmark below that after caching the redundant "file read and YAML parse" operations that the Faker modules perform up to 18% faster. Even the popular Faker::Lorem receives a 14.8% performance increase (averaged across all methods within that module). When combined; the entire Faker library benefits from a 10.6% speed improvement. Here is a list of the top 50 improved module times from caching the translations:

Checklist

Before submitting the PR make sure the following are checked:

[x] This Pull Request is related to one change. Changes that are unrelated should be opened in separate PRs.
[x] Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
[x] Tests are added or updated if you fix a bug, refactor something, or add a feature.
[x] Tests and Rubocop are passing before submitting your proposed changes.
[x] Double-check the existing generators documentation to make sure the new generator you want to add doesn't already exist.
[x] You've reviewed and followed the Contributing guidelines.

Apr 19 '24 07:04 alextaujenis

faker faker copied to clipboard

Performance: Cache translations to reduce file read and parse operations

Motivation / Background

Additional Information

Performance Benchmark

Checklist

faker
faker copied to clipboard