python-unidiff
python-unidiff copied to clipboard
Support Dolt table diffs
Dolt is a versionable MySQL database that can commit, branch, push, and pull just like a git repository. The diff output is similar enough to git's that PatchSet is able to parse it. However, the number of lines (or in Dolt's case the number of table rows) added/removed does not seem to get tracked correctly.
For a basic Dolt setup, see this a minimal example I made. Taking the diff string of the objects table from my example, I tried unsuccessfully to parse the additions/deletions with unidiff
from io import StringIO
from textwrap import dedent
from unidiff import PatchSet
dolt_diff = dedent("""
diff --dolt a/objects b/objects
--- a/objects @ 73hiqmiduef0sqtecba4fav7vuuvdk2l
+++ b/objects @ 1hq161cev9kkt6eukvap0jmrfeedvt9j
+-----+----+---------+------------------+
| | id | label | bbox |
+-----+----+---------+------------------+
| < | 1 | cat | [1, 2, 3, 4] |
| > | 1 | cat | [3, 4, 5, 6] |
| < | 2 | dog | [10, 20, 30, 40] |
| > | 2 | poodle | [10, 20, 30, 40] |
| < | 3 | dog | [5, 6, 7, 8] |
| > | 3 | bulldog | [5, 6, 7, 8] |
+-----+----+---------+------------------+
""")
patch_set = PatchSet(StringIO(dolt_diff))
for t, table in enumerate(patch_set):
table_name = table.path.split('@')[0].strip()
print(f'Dolt table {t}={table_name}: {table.added} additions / {table.removed} deletions')
This outputs
Dolt table 0=objects: 0 additions / 0 deletions
Whereas it should've been 3 additions / 3 deletions from the <> syntax in the first ASCII column of the diff. Is there a way to support Dolt's table diffs?
@matiasb Curious if you have any update on this?
hi! I think this exceeds the original goal of the project, so I'm not really sure how it would work as part of unidiff, although I can see it would be useful to have something like that for your use case. Having said that, it seems it shouldn't be complex to implement using the existing code as base, maybe we can get a branch started and see how that looks? Alternatively, it could be a fork and become something independent?
Is there a standard way you could see external project's writing a plugin to support their diff format? That seems like it could be a good solution. If so, I can take a look or raise the issue with the Dolt maintainers to get that written
Right now there isn't an easy way (it wasn't previously considered either) to have a pluggable way to specify a custom diff format. That would require some work. I think the simpler path to get something working (given in this case it seems a specific scenario and scope) would be to fork and adapt the existing code, as a separate thing. I can try to help/answer questions as time permits.