diff-lcs
diff-lcs copied to clipboard
Return grouped objects based on similar actions
I'm liking the output of sdiff but each "segment" is a separate object. Would it be possible to merge adjacent objects with similar actions into 1 object?
In my case I'm feeding in arrays of sentences. If someone adds a paragraph, the difference is shown as a collection of new sentences. Instead, I would like one <ins> tag around the whole new paragraph.
Update: This is what I've done to accommodate for now
def consolidateDiff(sdiff)
lastAction = ''
sdiff.each_with_index do |diff, index|
if diff.action == lastAction
sdiff[index-1].old_element << diff.old_element unless sdiff[index-1].old_element.nil?
sdiff[index-1].new_element << diff.new_element unless sdiff[index-1].new_element.nil?
sdiff.delete_at(index)
consolidateDiff(sdiff)
end
lastAction = diff.action
end
end
I'm not quite sure what you're asking for. Can you provide me a test case that shows a failing condition? It sounds intriguing.
Should have mentioned - It's a feature request, not a bug. I found it convenient to group together similar adjacent actions. This way if I do an sdiff on 2 bodies of text that are separated into sentences, a new paragraph will appear as a single "+" and not 4 "+"s, one for each sentence. Here's the context (apologies for the length!)
# Append similar actions into 1 "change". Recursive.
def consolidateDiff(sdiff)
lastAction = ''
sdiff.each_with_index do |diff, index|
if diff.action == lastAction
sdiff[index-1].old_element << diff.old_element unless sdiff[index-1].old_element.nil?
sdiff[index-1].new_element << diff.new_element unless sdiff[index-1].new_element.nil?
sdiff.delete_at(index)
consolidateDiff(sdiff)
end
lastAction = diff.action
end
end
def insertClass(body, styles)
elements = %w[p ol ul h6 h5 h4 h3 h2 h1]
elements.each do |element|
body = body.gsub("<#{element}", "<#{element} class=\"#{styles}\"")
end
return body
end
def compare
#versionOne is 'new', versionTwo is 'old' (typically)
@versionOne = @section.get_version params[:d1]
@versionTwo = @section.get_version params[:d2]
versions = @section.send :"#{@versionOne.type.underscore.pluralize}"
@mostRecentVersionID = versions.last.id
@vnum = versions.size
@nicetype = @versionOne.type.underscore.split('_').first
seq1 = @versionOne.body.gsub(/\s/, '\0|').split('|')
seq2 = @versionTwo.body.gsub(/\s/, '\0|').split('|')
sdiff = Diff::LCS.sdiff(seq2, seq1)
consolidateDiff(sdiff)
# Output the compare all pretty-like
diffHTML = ''
sdiff.each do |diff|
case diff.action
when '='
diffHTML << diff.new_element
when '!'
diffHTML << "<span class=\"diff-wrapper\">"
diffHTML << insertClass(diff.old_element, "del") << insertClass(diff.new_element, "ins")
diffHTML << "</span>"
when '-'
diffHTML << "<span class=\"diff-wrapper del\">"
diffHTML << insertClass(diff.old_element, "del")
diffHTML << "</span>"
when '+'
diffHTML << "<span class=\"diff-wrapper ins\">"
diffHTML << insertClass(diff.new_element, "ins")
diffHTML << "</span>"
end
end
@compareBody = diffHTML.html_safe
respond_to do |format|
format.html
end
end
I figure consolidateDiff would be a useful addition to the gem. If there's a simpler way to do this, let me know. It's helpful to have adjacent similar actions as one action because I'm later going implement an "approve changes" tinyMCE plugin where an editor can approve each change between 2 revisions.
Interesting. I'll have to play with this some to consider it. It sounds like a nice basis for a 1.3 release.
I updated consolidateDiff to actually work (derp). I also updated compare to split based on words (space delimited) instead of sentences. conolidateDiff now makes space delimited comparison output manageable. I could do character comparison (forget splitting all together) but in my wiki-like diff I wouldn't want confusing mid-word insertions and it would complicate having HTML tags render in the output.
It's getting some good output, but the format of the output gets messed up when there's a change involving a word adjacent to an HTML tag. The ideal split is like this:
seq1 = "<p>Here is a paragraph. A sentence with <strong>bold text</strong>.</p><p>The second paragraph.</p>"
seq1.magic
=> ["<p>", "Here ", "is ", "a ", "paragraph. ", "A ", "sentence ", "with ", "<strong>", "bold ", "text", "</strong>", ".", "</p>", "<p>", "The ", "second ", "paragraph.", "</p>"]
So I'm making use of nokogiri but it's not very pleasant. I have to blow up everything then reconstruct an array with the html tags and their attributes intact. I'll post the module I have once I know it works.
Looks interesting. I was hoping to be able to work on this for an April release, and now it looks more likely to be a June release earliest—my time is just not available for feature work and assessment.
I think you're on the right approach, but it may be possible to just compare the Nokogiri nodes as you can flatten them out. I'm not sure as I haven't tried it.
I have a similar tool on https://gist.github.com/skandragon/92b1ad57e360d3948138
@skandragon That's pretty awesome!
I've written that into a module along with an an HTML parsing class. Without running through the parser, the compare is correct, except it will split in the middle of HTML tags. The parser seems to work in small cases but for some reason, it removes tags completely after a certain point. Not sure what's up with that.
https://gist.github.com/archonic/8967057