data_miner icon indicating copy to clipboard operation
data_miner copied to clipboard

Is there a way to run a specific step?

Open towerhe opened this issue 12 years ago • 10 comments

I'm trying to use data_miner to achieve my routine importing jobs. In my case, I need to upload a xls file to my system to import the data from the file.

I have a lot of xls files with different (headers - cols) mappings. I defined import steps for each type of (headers - cols) mappings. So I need to run a specific import step after I upload a xls file. Is there a way to that?

towerhe avatar Feb 20 '13 03:02 towerhe

hi @towerhe you may be able to hack it with:

Car.data_miner_script.steps[9].start

it's a known problem with data_miner that this is hard to do - please let me know if you have suggestions!

seamusabshere avatar Feb 20 '13 16:02 seamusabshere

An import step need a static url which points to a resource. In my case, the url is dynamic. So for achieve my issues, I need to introduce new features to data_miner. But I have problems with running the specs.

I have degraded earth to 0.11.7, minitest to 3.5.0, and minitest-reporters to 0.9.0, but the specs still failed.

Would you please give me a favor on passing the specs?

towerhe avatar Feb 20 '13 16:02 towerhe

i hate to say it, but the tests have been neglected for years - they need to be cleaned up.

seamusabshere avatar Feb 20 '13 16:02 seamusabshere

yeah, I got it. i will have a try to improve it. but I have not any experiences on minitest.

BTW, IMO that the key is not need to an import step. If the key is defined, the records with the provided keys will only be updated. when there is no key defined, data_miner should create new records instead.

towerhe avatar Feb 20 '13 16:02 towerhe

If the key is defined, the records with the provided keys will only be updated. when there is no key defined, data_miner should create new records instead.

that should happen already - data_miner uses upsert internally - is that what you needed?

seamusabshere avatar Feb 20 '13 18:02 seamusabshere

But I have found the following codes:

def start
        if not validate? and (storing_primary_key? or table_has_autoincrementing_primary_key?)
          c = ActiveRecord::Base.connection_pool.checkout
          Upsert.stream(c, model.table_name) do |upsert|
            table.each do |row|
              selector = { @key => attributes[@key].read(row) }
              document = attributes.except(@key).inject({}) do |memo, (_, attr)|
                memo.merge! attr.updates(row)
                memo
              end
              upsert.row selector, document
            end
          end
          ActiveRecord::Base.connection_pool.checkin c
        else
          table.each do |row|
            record = model.send "find_or_initialize_by_#{@key}", attributes[@key].read(row)
            attributes.each { |_, attr| attr.set_from_row record, row }
            record.save!
          end
        end
        refresh
        nil
      end

Both the if block and the else one are need a @key, this means we have to define a key for our models.

towerhe avatar Feb 20 '13 18:02 towerhe

ok, i see what you mean - correct, data_miner assumes that it is always in upsert mode.

would your problem be solved if you could just leave out key and have it always insert?

seamusabshere avatar Feb 20 '13 18:02 seamusabshere

I'm now working hard to fix the tests. After I can pass all the tests, I will try to introduce a method to ignore the key.

towerhe avatar Feb 20 '13 18:02 towerhe

@towerhe do you need a gem release before you can close this?

seamusabshere avatar Feb 25 '13 22:02 seamusabshere

I haven't found a right way to implement this yet.

towerhe avatar Feb 26 '13 02:02 towerhe