sourcify
sourcify copied to clipboard
Workarounds before ruby-core officially supports Proc#to_source (& friends)
= Sourcify
== IMPORTANT#1: Sourcify was written in the days of ruby 1.9.x, it should be buggy for anything beyond that. == IMPORTANT#2: Sourcify is no longer maintained, use it at your own risk, & expect no bug fixes.
ParseTree[http://github.com/seattlerb/parsetree] is great, it accesses the runtime AST (abstract syntax tree) and makes it possible to convert any object to ruby code & S-expression, BUT ParseTree doesn't work for 1.9.* & JRuby.
RubyParser[http://github.com/seattlerb/ruby_parser] is great, and it works for any rubies (of course, not 100% compatible for 1.8.7 & 1.9.* syntax yet), BUT it works only with static code.
I truely enjoy using the above tools, but with my other projects, the absence of ParseTree on the different rubies is forcing me to hand-baked my own solution each time to extract the proc code i need at runtime. This is frustrating, the solution for each of them is never perfect, and i'm reinventing the wheel each time just to address a particular pattern of usage (using regexp kungfu).
Enough is enough, and now we have Sourcify, a unified solution to extract proc code. When ParseTree is available, it simply works as a thin wrapper round it, otherwise, it uses a home-baked ragel-generated scanner to extract the proc code. Further processing with RubyParser & Ruby2Ruby to ensure 100% with ParseTree (yup, there is no denying that i really like ParseTree).
== Installing It
The religiously standard way:
$ gem install ParseTree sourcify
Or on 1.9.* or JRuby:
$ gem install ruby_parser file-tail sourcify
== Sourcify adds 4 methods to Proc
=== 1. Proc#to_source
Returns the code representation of the proc:
require 'sourcify'
lambda { x + y }.to_source
>> "proc { (x + y) }"
proc { x + y }.to_source
>> "proc { (x + y) }"
Like it or not, a lambda is represented as a proc when converted to source (exactly the same way as ParseTree). It is possible to only extract the body of the proc by passing in {:strip_enclosure => true}:
lambda { x + y }.to_source(:strip_enclosure => true)
>> "(x + y)"
lambda {|i| i + 2 }.to_source(:strip_enclosure => true)
>> "(i + 2)"
=== 2. Proc#to_sexp
Returns the S-expression of the proc:
require 'sourcify'
x = 1 lambda { x + y }.to_sexp
>> s(:iter,
>> s(:call, nil, :proc, s(:arglist)),
>> nil,
>> s(:call, s(:lvar, :x), :+, s(:arglist, s(:call, nil, :y, s(:arglist)))))
To extract only the body of the proc:
lambda { x + y }.to_sexp(:strip_enclosure => true)
>> s(:call, s(:lvar, :x), :+, s(:arglist, s(:call, nil, :y, s(:arglist)))))
=== 3. Proc#to_raw_source
Unlike Proc#to_source, which returns code that retains only functional aspects, fetching of raw source returns the raw code enclosed within the proc, including fluff like comments:
lambda do |i| i+1 # (blah) end.to_raw_source
>> "proc do |i|
>> i+1 # (blah)
>> end"
NOTE: This is extracting of raw code, it relies on static code scanning (even when running in ParseTree mode), the gotchas for static code scanning always apply.
=== 4. Proc#source_location
By default, this is only available on 1.9., it is added (as a bonus) to provide consistency under 1.8.:
/tmp/test.rb
require 'sourcify'
lambda { x + y }.source_location
>> ["/tmp/test.rb", 5]
== Sourcify adds 3 methods to Method
IMPORTANT: These only work for MRI-1.9.2, as currently, only it supports (1) discovering of the original source location with Method#source_location, and (2) reliably determinig a method's parameters with Method#parameters. Attempting to use these methods on other rubies will raise Sourcify::PlatformNotSupportedError.
NOTE: The following works for methods defined using both def .. end & Module#define_method. However, when a method is defined using the later approach, sourcify uses Proc#to_source to handle the processing, thus, the usual gotchas related to proc source extraction apply.
=== 1. Method#to_source
Returns the code representation of the method:
require 'sourcify'
class MyMath def self.sum(x, y) x + y # (blah) end end
MyMath.method(:sum).to_source
>> "def sum(x, y)
>> (x + y)
>> end"
Just like the Proc#to_source equivalent, u can set :strip_enclosure => true to extract only the body within.
=== 2. Method#to_sexp
Returns the S-expression of the method:
require 'sourcify'
class MyMath def self.sum(x, y) x + y # (blah) end end
MyMath.method(:sum).to_sexp
s(:defn, :sum, s(:args, :x, :y), s(:scope, s(:block, s(:call, s(:lvar, :x), :+, s(:arglist, s(:lvar, :y))))))
Just like the Proc#to_sexp equivalent, u can set :strip_enclosure => true to extract only the body within.
=== 3. Method#to_raw_source
Unlike Method#to_source, which returns code that retains only functional aspects, fetching of raw source returns the method's raw code, including fluff like comments:
require 'sourcify'
class MyMath def self.sum(x, y) x + y # (blah) end end
MyMath.method(:sum).to_raw_source
>> "def sum(x, y)
>> x + y # (blah)
>> end"
Just like the Proc#to_raw_source equivalent, u can set :strip_enclosure => true to extract only the body within.
== Performance
Performance is embarassing for now, benchmarking results for processing 500 procs (in the ObjectSpace of an average rails project) yiels the following:
ruby user system total real ruby-1.8.7-p299 (w ParseTree) 10.270000 0.010000 10.280000 ( 10.311430) ruby-1.8.7-p299 (static scanner) 14.120000 0.080000 14.200000 ( 14.283817) ruby-1.9.1-p376 (static scanner) 17.380000 0.050000 17.430000 ( 17.405966) jruby-1.5.2 (static scanner) 21.318000 0.000000 21.318000 ( 21.318000)
Since i'm still pretty new to ragel[http://www.complang.org/ragel], the code scanner will probably become better & faster as my knowlegde & skills with ragel improve. Also, instead of generating a pure ruby scanner, we can generate native code (eg. C or java, or whatever) instead. As i'm a C & java noob, this will probably take some time to realize.
== Gotchas
Nothing beats ParseTree's ability to access the runtime AST, it is a very powerful feature. The scanner-based (static) implementation suffer the following gotchas:
=== 1. The source code is everything
Since static code analysis is involved, the subject code needs to physically exist within a file, meaning Proc#source_location must return the expected [file, lineno], the following will not work:
def test eval('lambda { x + y }') end
test.source_location
>> ["(eval)", 1]
test.to_source
>> Sourcify::CannotParseEvalCodeError
The same applies to Blah#to_proc & &:blah:
klass = Class.new do def aa(&block); block ; end def bb; 1+2; end end
klass.new.method(:bb).to_proc.to_source
>> Sourcify::CannotHandleCreatedOnTheFlyProcError
klass.new.aa(&:bb).to_source
>> Sourcify::CannotHandleCreatedOnTheFlyProcError
=== 2. Multiple matching procs per line error
Sometimes, we may have multiple procs on a line, Sourcify can handle this as long as the subject proc has arity that is unique from others:
Yup, this works as expected :)
b1 = lambda {|a| a+1 }; b2 = lambda { 1+2 } b2.to_source
>> proc { (1 + 2) }
Nope, this won't work :(
b1 = lambda { 1+2 }; b2 = lambda { 2+3 } b2.to_source
>> raises Sourcify::MultipleMatchingProcsPerLineError
As observed, the above does not work when there are multiple procs having the same arity, on the same line. Furthermore, this bug[http://redmine.ruby-lang.org/issues/show/574] under 1.8.* affects the accuracy of this approach.
To better narrow down the scanning, try:
-
passing in the {:attached_to => ...} option
x = lambda { proc { :blah } }
x.to_source
>> Sourcify::MultipleMatchingProcsPerLineError
x.to_source(:attached_to => :lambda)
>> "proc { proc { :blah } }"
-
passing in the {:ignore_nested => ...} option
x = lambda { lambda { :blah } }
x.to_source
>> Sourcify::MultipleMatchingProcsPerLineError
x.to_source(:ignore_nested => true)
>> "proc { lambda { :blah } }"
-
attaching a body matcher proc
x, y = lambda { def secret; 1; end }, lambda { :blah }
x.to_source
>> Sourcify::MultipleMatchingProcsPerLineError
x.to_source{|body| body =~ /^(.*\W|)def\W/ }
>> 'proc { def secret; 1; end }'
Pls refer to the rdoc for more details.
=== 3. Occasional Racc::ParseError
Under the hood, sourcify relies on RubyParser to yield s-expression, and since RubyParser does not yet fully handle 1.8.7 & 1.9.* syntax, you will get a nasty Racc::ParseError when you have any code that is not compatible with 1.8.6.
=== 4. Lambda operator doesn't work
When a lambda has been created using the lambda operator "->", sourcify can't handle it:
x = ->{ :blah }
x.to_source
# >> Sourcify::NoMatchingProcError
== Is it really working ??
Sourcify spec suite currently passes in the following rubies:
- MRI-1.8.*, REE-1.8.7 (both ParseTree & static scanner modes)
- JRuby-1.6., MRI-1.9. (static scanner ONLY)
Besides its own spec suite, sourcify has also been tested to handle:
ObjectSpace.each_object(Proc) {|o| puts o.to_source }
For projects:
- Spree[http://github.com/railsdog/spree]
- Redmine[http://github.com/edavis10/redmine]
(TODO: the more the merrier)
== Projects using it
Projects using sourcify include:
- wrong[http://github.com/sconover/wrong]
- ruote[http://ruote.rubyforge.org/]
- dm-ambition[https://github.com/dkubb/dm-ambition]
== Additional Resources
Sourcify is heavily inspired by many ideas gathered from the ruby community:
- http://www.justskins.com/forums/breaking-ruby-code-into-117453.html
- http://rubyquiz.com/quiz38.html (Florian Groß's solution)
- http://svenfuchs.com/2009/07/05/using-ruby-1-9-ripper.html
The sad fact that Proc#to_source wouldn't be available in the near future:
- http://redmine.ruby-lang.org/issues/show/2080
== Note on Patches/Pull Requests
- Fork the project.
- Make your feature addition or bug fix.
- Add tests for it. This is important so I don't break it in a future version unintentionally.
- Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
- Send me a pull request. Bonus points for topic branches.
== Copyright
Copyright (c) 2010 NgTzeYang. See LICENSE for details.