betty icon indicating copy to clipboard operation
betty copied to clipboard

Wrangle in regular expressions

Open brysgo opened this issue 10 years ago • 11 comments

In order to approach #57 in bite sized chunks, a good first step would be pulling in all the regular expression usage into the beginnings of a grammar.

This gem looks like it would be perfect for the task, check it out: http://semr.rubyforge.org/

If you don't feel like clicking a link here is one cool example...

require 'rubygems'
require 'semr'

language = Semr::Language.create do #also accepts a path to a file instead of a block
  concept :number,    any_number, :normalize => as_fixnum
  concept :greeting,  words('hi', 'goodbye', 'hello')
  phrase 'say :greeting :number times' do |greeting, number|
    number.to_i.times { puts greeting }
  end
end

language.parse('say hello 6 times')
# hello
# hello
# hello
# hello
# hello
# hello

language.parse('say goodbye 2 times')
# goodbye
# goodbye

brysgo avatar May 15 '14 00:05 brysgo

Semr looks interesting. I am concerned that it hasn't been updated in 6 years - https://github.com/mdeiters/semr

From what I've observed, projects usually end up building their own tokenizer, but there's no reason for us to do that yet if a library does 80% or more of what we'd want.

pickhardt avatar May 15 '14 00:05 pickhardt

I'd consider semr but do you know of any other good grammar projects?

Another approach is building a simple tokenizer ourselves.

One idea is to pass around an array<string|classes> as the list of commands to interpret. Rather than "copy all files ending in rb to my projects directory" it'd something like be:

  1. ["copy all files ending in rb to my projects directory"]
  2. ["copy all files ending in rb to ", Directory instance]
  3. ["copy", Files instance, "to", Directory instance]
  4. [Copy command, Files instance, "to", Directory instance]
  5. [Copy command, Files instance, Preposition instance, Directory instance]
  6. execute!

Where we create classes for Directory, File, Preposition, and Command.

pickhardt avatar May 15 '14 00:05 pickhardt

That sounds good. I would definitely be up for building a tokenizer, as long as it scales well I think it could be great.

I think being able to have broad tokens that could be made progressively more specific by drawing from context would be super cool.

brysgo avatar May 15 '14 00:05 brysgo

class Copy < Token
  argument source: FileInstance
  argument destination: DirectoryInstance,
    question: 'Where would you like to copy it to?'
  statement 'copy', :source, 'to', :destination
  statement 'copy', :source,
    ask: :destination

  def call
    BashCommand.call('cp', source.call, destination.call)
  end
end

Does this seem like a reasonable sketch of what copy might look like?

brysgo avatar May 15 '14 00:05 brysgo

Yes, that makes a lot of sense.

One thing is that 'copy' and 'to' should actually be more general, maybe a regex instead of a string, because I could imagine wanting to say "duplicate source to my home directory".

pickhardt avatar May 21 '14 05:05 pickhardt

:+1: Okay, I will start implementing this refactor at some point this week and throw it on a branch.

brysgo avatar May 21 '14 13:05 brysgo

How would you feel about doing something like this?:

  argument copy: WordExpansion
  statement :copy, :source, 'to', :destination

Meanwhile, in WordExpansion land...

class WordExpansion < Token
  statement do |options|
    search_text = options[:search_text]
    argument_name = options[:argument_name]
    expansions = Thesaurus.expand(argument_name)
    expansions.each do |expansion|
      if result = pop_text(search_text, expansion)  
        return result
      end
    end
    nil
  end

private

def pop_text(search_text, keyword)
  split_text = search_text.split(keyword)
  if split_text.length > 1 && split_text[0] == ''
    split_text[1..-1].join(keyword)
  end
  nil
end

brysgo avatar May 24 '14 01:05 brysgo

Never mind on the above for now.

I started implementing it and realized that we probably want to start by just making all the arguments a SimpleMatcher type to get the api fleshed out. After that we can find common argument types and factor them out.

brysgo avatar May 24 '14 13:05 brysgo

OK. I'm interested in what approach you are taking, so feel free to start a pull request early while you're in the process of developing it.

pickhardt avatar May 24 '14 16:05 pickhardt

Right now I am moving the find command over to the style we talked about above. I will submit a pull when I have it in some working order that doesn't look to horrendous.

I don't usually do too much ruby meta programming so I've been playing with the Token class trying to get it to work with the same api as above.

brysgo avatar May 24 '14 23:05 brysgo

I didn't forget about this...

After my experiment with the pull request above I decided it might be easier to go with semr after all.

Check out mdeiters/semr#1

brysgo avatar Jun 09 '14 21:06 brysgo