diff-match-patch
diff-match-patch copied to clipboard
How to get char positions of changed text in the original text
I want to get char positions of changed text along with change tuples. Please help if anyone implemented this?
Example in Python:
import diff_match_patch as dmp_module
dmp = dmp_module.diff_match_patch()
diff = dmp.diff_main("Hello World.", "Goodbye World.")
# Result: [(-1, "Hell"), (1, "G"), (0, "o"), (1, "odbye"), (0, " World.")]
#Desired Result:[(-1,"Hell",0,3,x,x),(0,"o",4,4,1,1),(1,"odbye",x,x,2,6),(0," World",6,11,7,13)]
Here in Desired Result "x" means don't care in "delete" and "insertion" operations.
One way I thought about this is, get the changed text from tuple and find index from original text(this will be starting position") and add length of the changed text to index found, but i think this will fail in case of repeated substring. If implementation is available in other languages also will be helpful for me.
Thanks.
@sathk882 can you explain what each value in your example tuples mean? It's not clear what you are asking for. Are you asking for the ranges in the original text and also in the final text of where the characters in the current diff are to be found?
If that's the case then we really just need to think about what we do with each chunk and how it impacts a running pointer into the source strings.
d = [(-1, "Hell"), (1, "G"), (0, "o"), (1, "odbye"), (0, " World.")]
def step( last_a, last_b, next ):
op, s = next
l = len(s)
if -1 == op:
return ((op, s, (last_a, last_a + l), None), (last_a + l, last_b))
if 0 == op:
return ((op, s, (last_a, last_a + l), (last_b, last_b + l)), (last_a + l, last_b+l))
if 1 == op:
return ((op, s, None, (last_b, last_b + l)), (last_a, last_b+l))
o = []
last_a = 0
last_b = 0
for chunk in d:
next, (next_a, next_b) = step(last_a, last_b, chunk)
last_a = next_a
last_b = next_b
o.append(next)
print(o)
[(-1, 'Hell', (0, 4), None), (1, 'G', None, (0, 1)), (0, 'o', (4, 5), (1, 2)), (1, 'odbye', None, (2, 7)), (0, ' World.', (5, 12), (7, 14))]
i think this will fail in case of repeated substring
Can you explain this more?