data-import Provide options to import mapping execution contexts.

You might want a few more unit tests, and there's a possibility this doesn't cover all execution contexts - though it meets all mine.

I've added a new acceptance spec.

Oct 31 '13 13:10 andyt

This is not related to #38, is it? Couldn't your use case be solved with the helper modules discussed in #38?

Oct 31 '13 14:10 stmichael

I don't think helper modules would cover my use case. I don't think this is related really - helper modules are static extensions to the execution context, whereas I needed to make them configurable. That said, I would find helper modules useful, hence my comments on the original ticket.

Oct 31 '13 14:10 andyt

@andyt why not use ENV variables for this configuration?

Oct 31 '13 14:10 senny

Your solution is very specific to your use case. I don't like the idea of passing the options all around the code just to make them available in the execution context.

Oct 31 '13 14:10 stmichael

Extending the context of a mapping would seem to be quite a common requirement. You might want to do this for configuration or scoping, perhaps based on some selections in an interface.

However, I agree that passing options everywhere isn't elegant, although that's a side effect of the multiple contexts used. If there was one context (if such a thing was possible), it wouldn't be so pervasive.

In my use case, ENV variables won't work. I'm using your library within https://github.com/mperham/sidekiq, which is multi-threaded. I would prefer to avoid using Thread.current, which would be another option.

Anyway, I'm open to other suggestions. Thanks again for a great library.

Oct 31 '13 15:10 andyt

@andyt Did you find any solution for your problem?

I just had another idea. Suppose these helper modules from issue #38 already existed. How about you take the parameters you receive from sidekiq and feed them into a module which will then be included in the import blocks. That's more structured than just defining global variables und the DSL won't get that much complicated.

Nov 06 '13 07:11 stmichael

I think your suggestion suffers from the same problem - unless I can make the module name for inclusion dynamic, then surely I can only include the same module each time, and I have to resort to using Thread.current. Or am I misunderstanding your suggestion?

Nov 06 '13 10:11 andyt

I don't know your exact use case and the environment in which you'd like to use data-import. I assumed you wanted to pass some arguments you received from sidekiqs perform method to data-import. In this case I think it would be possible to use the helpers.

module OptionsHelper
  # methods to read and write options
end

class Worker
  def perform(options)
    fill_options_into_helper(options)
    DataImport.run_config! mapping_path
  end
end

And in the mapping you include OptionsHelper.

@andyt what do you think?

Nov 06 '13 12:11 stmichael

Yes, that's the kind of thing I thought you were suggesting. Wouldn't all threads in a process access the same options?

Nov 06 '13 13:11 andyt

Oh, now I see your point. Sorry, it took so long :cry:

I don't know sidekiq but if the jobs all run in the same process, the options would indeed be the same.

I'm not a great fan of background job libraries that run in one process like delayed_job (although I'm not sure if this is still the case) and apparently sidekiq too. We once had the use case in which a worker had to process a large amount of data. Unfortunately the implementation of our worker was so lousy that it allocated a huge amount of memory without freeing it completely. After 3-4 finished jobs the worker process allocated about 2GB of memory. (Ruby doesn't free memory it once allocated). After another few jobs the memory was full. Even swapping became impossible. And this lead to a complete freeze of the machine including the main website.

Of course it would have been wise to make the implementation more memory aware. But still, I find it odd that such jobs make the machine freeze. I mean after a job ran I would expect it to vanish completely and not leaving any traces (except the ones it SHOULD leave). And that's why I love resque. Every job runs in a child process of the worker process. After the job finishes the child process will be destroyed and thus freeing all allocated memory and resources.

Would that be an option for you?

Nov 06 '13 14:11 stmichael

I like Resque too, but Sidekiq has a better memory footprint (given no leaks!). We could switch to Resque, but the commit I made in my fork is working for me.

Nov 06 '13 14:11 andyt