betty
betty copied to clipboard
Wrangle in regular expressions
In order to approach #57 in bite sized chunks, a good first step would be pulling in all the regular expression usage into the beginnings of a grammar.
This gem looks like it would be perfect for the task, check it out: http://semr.rubyforge.org/
If you don't feel like clicking a link here is one cool example...
require 'rubygems'
require 'semr'
language = Semr::Language.create do #also accepts a path to a file instead of a block
concept :number, any_number, :normalize => as_fixnum
concept :greeting, words('hi', 'goodbye', 'hello')
phrase 'say :greeting :number times' do |greeting, number|
number.to_i.times { puts greeting }
end
end
language.parse('say hello 6 times')
# hello
# hello
# hello
# hello
# hello
# hello
language.parse('say goodbye 2 times')
# goodbye
# goodbye
Semr looks interesting. I am concerned that it hasn't been updated in 6 years - https://github.com/mdeiters/semr
From what I've observed, projects usually end up building their own tokenizer, but there's no reason for us to do that yet if a library does 80% or more of what we'd want.
I'd consider semr but do you know of any other good grammar projects?
Another approach is building a simple tokenizer ourselves.
One idea is to pass around an array<string|classes> as the list of commands to interpret. Rather than "copy all files ending in rb to my projects directory" it'd something like be:
- ["copy all files ending in rb to my projects directory"]
- ["copy all files ending in rb to ", Directory instance]
- ["copy", Files instance, "to", Directory instance]
- [Copy command, Files instance, "to", Directory instance]
- [Copy command, Files instance, Preposition instance, Directory instance]
- execute!
Where we create classes for Directory, File, Preposition, and Command.
That sounds good. I would definitely be up for building a tokenizer, as long as it scales well I think it could be great.
I think being able to have broad tokens that could be made progressively more specific by drawing from context would be super cool.
class Copy < Token
argument source: FileInstance
argument destination: DirectoryInstance,
question: 'Where would you like to copy it to?'
statement 'copy', :source, 'to', :destination
statement 'copy', :source,
ask: :destination
def call
BashCommand.call('cp', source.call, destination.call)
end
end
Does this seem like a reasonable sketch of what copy might look like?
Yes, that makes a lot of sense.
One thing is that 'copy' and 'to' should actually be more general, maybe a regex instead of a string, because I could imagine wanting to say "duplicate source to my home directory".
:+1: Okay, I will start implementing this refactor at some point this week and throw it on a branch.
How would you feel about doing something like this?:
argument copy: WordExpansion
statement :copy, :source, 'to', :destination
Meanwhile, in WordExpansion
land...
class WordExpansion < Token
statement do |options|
search_text = options[:search_text]
argument_name = options[:argument_name]
expansions = Thesaurus.expand(argument_name)
expansions.each do |expansion|
if result = pop_text(search_text, expansion)
return result
end
end
nil
end
private
def pop_text(search_text, keyword)
split_text = search_text.split(keyword)
if split_text.length > 1 && split_text[0] == ''
split_text[1..-1].join(keyword)
end
nil
end
Never mind on the above for now.
I started implementing it and realized that we probably want to start by just making all the arguments a SimpleMatcher
type to get the api fleshed out. After that we can find common argument types and factor them out.
OK. I'm interested in what approach you are taking, so feel free to start a pull request early while you're in the process of developing it.
Right now I am moving the find
command over to the style we talked about above. I will submit a pull when I have it in some working order that doesn't look to horrendous.
I don't usually do too much ruby meta programming so I've been playing with the Token
class trying to get it to work with the same api as above.
I didn't forget about this...
After my experiment with the pull request above I decided it might be easier to go with semr after all.
Check out mdeiters/semr#1