diff-lcs Return grouped objects based on similar actions

I'm liking the output of sdiff but each "segment" is a separate object. Would it be possible to merge adjacent objects with similar actions into 1 object?

In my case I'm feeding in arrays of sentences. If someone adds a paragraph, the difference is shown as a collection of new sentences. Instead, I would like one <ins> tag around the whole new paragraph.

Update: This is what I've done to accommodate for now

def consolidateDiff(sdiff)
  lastAction = ''
  sdiff.each_with_index do |diff, index|
    if diff.action == lastAction
      sdiff[index-1].old_element << diff.old_element unless sdiff[index-1].old_element.nil?
      sdiff[index-1].new_element << diff.new_element unless sdiff[index-1].new_element.nil?
      sdiff.delete_at(index)
      consolidateDiff(sdiff)
    end
    lastAction = diff.action
  end
end

Mar 28 '13 17:03 archonic

I'm not quite sure what you're asking for. Can you provide me a test case that shows a failing condition? It sounds intriguing.

Mar 30 '13 18:03 halostatue

Should have mentioned - It's a feature request, not a bug. I found it convenient to group together similar adjacent actions. This way if I do an sdiff on 2 bodies of text that are separated into sentences, a new paragraph will appear as a single "+" and not 4 "+"s, one for each sentence. Here's the context (apologies for the length!)

    # Append similar actions into 1 "change". Recursive.
    def consolidateDiff(sdiff)
      lastAction = ''
      sdiff.each_with_index do |diff, index|
        if diff.action == lastAction
          sdiff[index-1].old_element << diff.old_element unless sdiff[index-1].old_element.nil?
          sdiff[index-1].new_element << diff.new_element unless sdiff[index-1].new_element.nil?
          sdiff.delete_at(index)
          consolidateDiff(sdiff)
        end
        lastAction = diff.action
      end
    end

    def insertClass(body, styles)
      elements = %w[p ol ul h6 h5 h4 h3 h2 h1]
      elements.each do |element|
        body = body.gsub("<#{element}", "<#{element} class=\"#{styles}\"")
      end
      return body
    end

    def compare
      #versionOne is 'new', versionTwo is 'old' (typically)
      @versionOne = @section.get_version params[:d1]
      @versionTwo = @section.get_version params[:d2]
      versions = @section.send :"#{@versionOne.type.underscore.pluralize}"
      @mostRecentVersionID = versions.last.id
      @vnum = versions.size
      @nicetype = @versionOne.type.underscore.split('_').first

      seq1 = @versionOne.body.gsub(/\s/, '\0|').split('|')
      seq2 = @versionTwo.body.gsub(/\s/, '\0|').split('|')

      sdiff = Diff::LCS.sdiff(seq2, seq1)

      consolidateDiff(sdiff)

      # Output the compare all pretty-like
      diffHTML = ''
      sdiff.each do |diff|
        case diff.action
          when '='
            diffHTML << diff.new_element
          when '!'
            diffHTML << "<span class=\"diff-wrapper\">"
            diffHTML << insertClass(diff.old_element, "del") << insertClass(diff.new_element, "ins")
            diffHTML << "</span>"
          when '-'
            diffHTML << "<span class=\"diff-wrapper del\">"
            diffHTML << insertClass(diff.old_element, "del")
            diffHTML << "</span>"
          when '+'
            diffHTML << "<span class=\"diff-wrapper ins\">"
            diffHTML << insertClass(diff.new_element, "ins")
            diffHTML << "</span>"
        end
      end

      @compareBody = diffHTML.html_safe

      respond_to do |format|
        format.html
      end
    end

I figure consolidateDiff would be a useful addition to the gem. If there's a simpler way to do this, let me know. It's helpful to have adjacent similar actions as one action because I'm later going implement an "approve changes" tinyMCE plugin where an editor can approve each change between 2 revisions.

Apr 01 '13 03:04 archonic

Interesting. I'll have to play with this some to consider it. It sounds like a nice basis for a 1.3 release.

Apr 04 '13 03:04 halostatue

I updated consolidateDiff to actually work (derp). I also updated compare to split based on words (space delimited) instead of sentences. conolidateDiff now makes space delimited comparison output manageable. I could do character comparison (forget splitting all together) but in my wiki-like diff I wouldn't want confusing mid-word insertions and it would complicate having HTML tags render in the output.

It's getting some good output, but the format of the output gets messed up when there's a change involving a word adjacent to an HTML tag. The ideal split is like this:

seq1 = "<p>Here is a paragraph. A sentence with <strong>bold text</strong>.</p><p>The second paragraph.</p>"
seq1.magic
=> ["<p>", "Here ", "is ", "a ", "paragraph. ", "A ", "sentence ", "with ", "<strong>", "bold ", "text", "</strong>", ".", "</p>", "<p>", "The ", "second ", "paragraph.", "</p>"]

So I'm making use of nokogiri but it's not very pleasant. I have to blow up everything then reconstruct an array with the html tags and their attributes intact. I'll post the module I have once I know it works.

Apr 29 '13 22:04 archonic

Looks interesting. I was hoping to be able to work on this for an April release, and now it looks more likely to be a June release earliest—my time is just not available for feature work and assessment.

I think you're on the right approach, but it may be possible to just compare the Nokogiri nodes as you can flatten them out. I'm not sure as I haven't tried it.

Apr 30 '13 01:04 halostatue

I have a similar tool on https://gist.github.com/skandragon/92b1ad57e360d3948138

Jun 14 '13 18:06 skandragon

@skandragon That's pretty awesome!

I've written that into a module along with an an HTML parsing class. Without running through the parser, the compare is correct, except it will split in the middle of HTML tags. The parser seems to work in small cases but for some reason, it removes tags completely after a certain point. Not sure what's up with that.

https://gist.github.com/archonic/8967057

Feb 13 '14 00:02 archonic

diff-lcs diff-lcs copied to clipboard

Return grouped objects based on similar actions

diff-lcs
diff-lcs copied to clipboard