search-api icon indicating copy to clipboard operation
search-api copied to clipboard

Make bulk loader work with arrays instead of strings

Open MatMoore opened this issue 8 years ago • 0 comments

Moved from https://trello.com/c/8zhBPuQT/12-make-bulk-loader-work-with-arrays-instead-of-strings.

What

Every night a job runs to rebuild the search index with new popularity data. https://github.com/alphagov/search-analytics/blob/master/nightly-run.sh

The bulk load script accepts text from standard input, representing elasticsearch documents. It then calls indexing code that is shared with regular indexing functionality, even though the argument type is different.

This makes the code really difficult to work on, because any value can be either a string or an array of hashes. This complexity affects all of the indexing code, eg

    def bulk_payload(document_hashes_or_payload)
      if document_hashes_or_payload.is_a?(Array)
        index_items_from_document_hashes(document_hashes_or_payload)
      else
        index_items_from_raw_string(document_hashes_or_payload)
      end
    end

Why

There are two separate code paths that essentially do the same thing, and if you make any change to this code you have to be very careful to change both of them in the same way, and test both of them.

MatMoore avatar Dec 18 '17 16:12 MatMoore