rails-dev-boost
rails-dev-boost copied to clipboard
ModuleCache initializing takes a minute
I'm having problem where every reload is taking about a minute. With the debug mode on I was able to trace through the code to dependencies_patch Line: 115
Apparently iterating over ObjectSpace and mapping _mod_name takes a really long time.
For example, if I open up rails console and run
ObjectSpace.each_object(Module){|mod| mod._mod_name }
take about 100 seconds. The count returned is 26000+.
I've monkey patched Line 231 of same file module_cache = nil so that it doesn't reload the cache on every file reload. This improved the reload time significantly to around 1s, but I'm not sure what side effects I've introduced (I'm a ruby nub).
Any insights or point me to the right direction? Thanks!
This improved the reload time significantly to around 1s, but I'm not sure what side effects I've introduced (I'm a ruby nub).
This should have broke rails-dev-boost, ie. it will now fail to detect classes/modules' inter-dependencies.
The count returned is 26000+.
This indicates either a really huge app (with 26000+ classes/modules/.rb files, just booting a Rails console should take about 10 minutes) or the app is somehow generating a lot of classes/modules on the fly. Either way this shouldn't take 100 seconds to iterate over.
For example here the an output for an app I have (running ruby 2.0):
irb(main):001:0> ObjectSpace.each_object(Module) {}
=> 4235
irb(main):002:0> t=Time.now; 10.times {ObjectSpace.each_object(Module) {|m| m._mod_name}}; Time.now-t
=> 0.804799
As can be seen the app has 4235 modules/classes, and iterating over them 10 times (this should be equivalent to iterating over 42350 modules/classes) takes 0.8s.
Any insights or point me to the right direction?
Check how fast "pure" ObjectSpace.each_object(Module) runs, ie:
irb(main):003:0> t=Time.now; 10.times {ObjectSpace.each_object(Module) {|m| m}}; Time.now-t
=> 0.122112
This way we can rule out that native ObjectSpace.each_object(Module) is to blame.
Then, check whether ._mod_name method is the correct one (it should be an alias of a native Module#name method):
irb(main):004:0> ObjectSpace.each_object(Module).map {|mod| mod.method(:_mod_name).unbind}.uniq
=> [#<UnboundMethod: Module#name>, #<UnboundMethod: Class(Module)#name>]
Yeah the Rails app I'm working on is pretty ginormous. Here's the benchmarks:
[1] pry(main)> ObjectSpace.each_object(Module) {}
=> 26605
[2] pry(main)> t=Time.now; 10.times {ObjectSpace.each_object(Module) {|m| m }}; Time.now-t
=> 0.570916
and then calling _mod_name
[1] pry(main)> t=Time.now; ObjectSpace.each_object(Module) {|m| m._mod_name }; Time.now-t
=> 34.679376
but take note that I this is just ONE call. Iterating over 10x would take 5min+.
Lastly this is the output of the unbind:
[2] pry(main)> ObjectSpace.each_object(Module).map {|mod| mod.method(:_mod_name).unbind}.uniq
=> [#<UnboundMethod: Module#_mod_name(name)>,
#<UnboundMethod: Class(Module)#_mod_name(name)>,
#<UnboundMethod: StateMachine::HelperModule(Module)#_mod_name(name)>]
Mine current hunch is that you are running into an issue with Module#name (note that _mod_name is a pure method alias for native Method#name) being 2 to 3 orders of magnitudes slower for anonymous modules/classes (ie: when mod.name # => nil) than for "named" modules/classes (when mod.name returns a string, for example RailsDevelopmentBoost.name # => "RailsDevelopmentBoost").
This is because: if a module/class name is unknown (which it is for "anonymous" modules/classes) the whole Ruby constant namespace is traversed by MRI trying to find the name of the mod, if a name is found, it is cached and all subsequent mod.name calls return that cached result. If traversing constant namespace yields no matches, nothing is cached and next time mod._mod_name is called MRI has to re-traverse everything yet again.
You can check how many of the modules/classes of your app are "anonymous" like this:
irb(main):002:0> ObjectSpace.each_object(Module).select {|m| RailsDevelopmentBoost::DependenciesPatch::Util.anonymous_const?(m)}.size
=> 172
Does this yield thousands modules/classes for you (I'm expecting it to be something like 20000+)?
I think your hunch is right, here:
[1] pry(main)> ObjectSpace.each_object(Module).select {|m| RailsDevelopmentBoost::DependenciesPatch::Util.anonymous_const?(m)}.size
=> 19224
So this is interesting. I did read about anonymous module/classes and their side effects. I'll have to read up on it.
Is keeping a persistent set/hash of anon class to skip the _mod_name look up a good solution?
require 'set'
anon = Set.new([])
2.times do
t = Time.now
ObjectSpace.each_object(Module) do |mod|
next if anon.include? mod
m = mod._mod_name
if !m
anon.add mod
end
end
puts Time.now - t
puts anon.length
end
And here's the benchmark after:
33.183677
19226
0.102118
19226
Don't know what to do about that abysmal anon module/class mod.name performance on MRI. For example with my app I have 4009 non-anonymous modules/classes and calling .name on all of them takes 0.002s, while invoking .name on remaining 220 anonymous modules/classes takes 0.06s. What's even worse is - the bigger your app gets, the slower .name call for anon modules becomes (as the namespace structure grows, the longer it takes to traverse).
Not sure if I'll ever be able to fix this for you (short of fixing MRI).
You could alleviate the problem by giving your anonymous modules names, not sure have feasible that is given your app's constraints:
def deanonymize_module_or_class(mod)
Object.const_set(:XYZ, mod)
mod.name # ensure the name is computed and cached by MRI
Object.send(:remove_const, :XYZ) # unset the const, `mod` will now forever retain XYZ const name
end
Note that Ruby will not care that 19224 of your classes/modules now have the same mod.name.
Is keeping a persistent set/hash of anon class to skip the _mod_name look up a good solution?
I can't, they wouldn't get garbage collected by Ruby VM then...
I also can't set a an i-var on them like this:
if mod.instance_variable_get(:@anon_module)
# skip
elsif Util.anonymous_const_name?(mod._mod_name)
mod.instance_variable_set(:@anon_module, true)
else
# work with mod
end
Because next time around a previously anonymous module might no longer be anonymous and ModuleCache needs to keep track of it.
You could alleviate the problem by giving your anonymous modules names, not sure have feasible that is given your app's constraints:
This is what I was planning. In config/application.rb I've added the deanonymizing code in an after_initialize block. Obviously this adds about an extra minute to the application boot time, but much better than the 40s per reload time.
I guess unless we monkey patch MRI, this is good enough for me. What I'm wondering is how does languages like Python handles anon name look ups.
What I'm wondering is how does languages like Python handles anon name look ups.
This is purely an MRI implementation problem, other Ruby VMs (JRuby and Rubinius) don't suffer from this issue, they always keep track of whether a module/class has a name or not, if it doesn't nil is returned right away (the whole namespace/constants-world is never traversed). Also because returning a nil from mod.name call does not involve a memory allocation (whereas on the other hand for "named" module/class mod.name call always returns a new String instance), calling mod.name on anonymous modules/classes is actually faster :).
This is because MRI optimizes for boot-up performance (when a module acquires a name, MRI chooses not to bother storing it on a module's object, thinking "if somebody asks for a module name, we'll get around to it and figure it out") whereas JRuby and Rubinius always keep module object's name up to date (an anonymous module has a null pointer, if it ever acquires a name a pointer is updated with the proper name).
PS: personally I believe MRI behavior is a legacy artifact/quirk and switching it to JRuby/Rubinius "mode" won't result in any boot-up slow down.
@pxue not sure if rails-dev-boost is ever going to attempt to fix your problem, but there is hope on the MRI front! https://bugs.ruby-lang.org/issues/11119
@thedarkone Nice! we should back contribute to the discussion. we have some nice real world benches here of the problem.