faker
faker copied to clipboard
Performance: Cache translations to reduce file read and parse operations
Motivation / Background
This Pull Request has been created because the Faker
library is reading from disk and parsing the yml
translation files on nearly every method call. These redundant "file read and YAML parse" operations can be efficiently cached in-memory to increase the performance of the entire library by 10.6% (faster), full modules are up to 18% (faster), and individual methods are up to 30% (faster).
Additional Information
- The translate cache in this PR is stored in a class variable so it can be shared during the program operation.
- The lookup key is a combination of the
args
string with a deterministically serialized (sorted)opts
hash. (This allows the cache to generate the correct lookup key regardless of the order of the providedopts
keys.) - The call to
I18n.translate
is then either retrieved from memory, or cached for future lookup and returned.
@@translate_cache = {}
def translate(*args, **opts)
translate_key = args.to_s + opts.sort.join
@@translate_cache[translate_key] ||= I18n.translate(*args, **opts)
end
This cache speeds up the operation of the entire Faker
library by 10.6%, but it comes with the slight tradeoff of increasing memory size as the cache is warmed (as you use Faker methods within your program). Fortunately enough; the ENTIRE Faker
English translation directory is only 2.2MB, while all Faker
translations combined are 7.1MB. Allocating anywhere from 2MB - 7MB of extra memory for the Faker
library to run 10.6% faster is a great tradeoff today with system memory typically measured in thousands of megabytes.
Performance Benchmark
You can see from the benchmark below that after caching the redundant "file read and YAML parse" operations that the Faker
modules perform up to 18% faster. Even the popular Faker::Lorem
receives a 14.8% performance increase (averaged across all methods within that module). When combined; the entire Faker
library benefits from a 10.6% speed improvement. Here is a list of the top 50 improved module times from caching the translations:
Checklist
Before submitting the PR make sure the following are checked:
- [x] This Pull Request is related to one change. Changes that are unrelated should be opened in separate PRs.
- [x] Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex:
[Fix #issue-number]
- [x] Tests are added or updated if you fix a bug, refactor something, or add a feature.
- [x] Tests and Rubocop are passing before submitting your proposed changes.
- [x] Double-check the existing generators documentation to make sure the new generator you want to add doesn't already exist.
- [x] You've reviewed and followed the Contributing guidelines.