diff-match-patch
diff-match-patch copied to clipboard
Convert Project to Maven
This PR converts the project to maven with support to be able to publish to Maven Central
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).
:memo: Please visit https://cla.developers.google.com/ to sign.
Once you've signed, please reply here (e.g. I signed it!
) and we'll verify. Thanks.
- If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.
- If your company signed a CLA, they designated a Point of Contact who decides which employees are authorized to participate. You may need to contact the Point of Contact for your company and ask to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the project maintainer to go/cla#troubleshoot. The email used to register you as an authorized contributor must be the email used for the Git commit.
- In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again. If the bot doesn't comment, it means it doesn't think anything has changed.
I signed it!
CLAs look good, thanks!
@NeilFraser I have taken care of comments.
About
Also, are all those sections of pom.xml needed? There seems to be a lot of boilerplate copied from some other project. The smaller the footprint the less likely it is to break.
You are right. I copied most of the template from my another project google-analytics-java. I copied with intention that sometime you would want to push this to library to maven central so that folks can easily consume it.
However, Google may have different way of publishing to central (guava team might know about it). if that is the case, I can remove all that additional sections.
@NeilFraser I have converted your tests to Junit based tests so that you can just execute them with command mvn test
, which will take care to setup classpath etc.
Converting the tests to JUnit (or any other test framework) will increase maintainability, not decrease. It is easier to spot issues given the fact that Most IDEs are optimized for it. Also the error message would be more readable; assertEqual
would print both the expected value and the actual value, useful if we have Continuous Integration enabled in future. It may need discussion about which one to use, but using a test framework is a good idea overall, for tests larger than 10 testcases.
However I myself am not familiar with plugins, and don't use that many plugins. For one, I doubt whether Nexus Staging Maven Plugin are necessary in this pull request.
@franklinyu Reason nexus related stuff is there with assumption that Neil would want to push this library to Maven Central so that is can be easily adopted by others.
However since this google, they might have another internal approach they are following for other google libs like guava. If Neil wants to follow internal approach, I will remove all Nexus related stuff.
Not familiar with the plugin, I assume that it automates pushing to Maven Central. However this repository does not seems to revolve that fast. I would suggest switching to Nexus only when manually pushing to Maven Central becomes a burden over @NeilFraser. Right now I think simply (manually) pushing the current state to Maven would be enough.
Was there any development on this issue? I would be extremely advantageous to have the possibility to use this project via Maven Central.
I would like to see an official release of this project on Maven too - I am currently importing an unofficial release (called "current") from a RedHat repository, but this updates dynamically:
https://maven.repository.redhat.com/ga/diff_match_patch/diff_match_patch/current/
I would much rather import an official version with a proper version number, so I was delighted to see this pull request - however, it seems to have stalled.
@brsanthu - I suggest you remove the Nexus stuff from pom.xml
as per the suggestion from @franklinyu. You will also need to resolve the conflicts, but then your pull request should be ready to accept. Thanks.
- Removed the nexus related config from pom.xml
- Removed the conflicts with latest
- Integrated latest diff_match_patch_test.java to run as Junit tests
@NeilFraser ready for another review.
Whats stopping this from getting merged ?
Please rebase your branch and also I would highly recommend upgrading the java version from 1.6 to atleast 1.8.
About 1.8: If code is not using any 1.8 features, why make it that version as minimum?
About rebasing: Unless Neil confirms that he will review and merge, there is no point in rebasing as he updates the code, it will get out of sync.
Its not just about if the code does not use any 1.8 features. java 6 has had pretty much end of life for public updates a long time ago. Open source code does not have to keep using a version which will not have public updates anymore.
Also,
diff_match_patch.java:[1813,28]
Use @Deprecated
with the javadoc @deprecated
The javadoc @deprecated
only marks it as deprecated in the generated javadoc where as the java annotation java.lang.Deprecated
lets the compiler know, so any consumer using that method gets to know if they are using a deprecated function/method.
@mithuns I'd understand the merits of 1.8. But it is upto Neil to make that call.
Ping? @NeilFraser
And? Does the library now show up in maven central which is the objective if I understand right? Is this https://search.maven.org/artifact/fun.mike/diff-match-patch/0.0.2/jar a compatible release?
@WolfgangFahl How is the group ID fun.mike
related to @NeilFraser?
@franklinyu
you asked: How is the group ID fun.mike
related to @NeilFraser?
Answer: I have no clue. My question was about the compatibility. I did not find the source repository for that release anywhere but the class file content looks somewhat o.k. I am using that release for a workaround of this issue not being closed yet and an "official" maven release being available.
Any update on this? I'd be happy to take a look if there is anything else that's needed. @NeilFraser @brsanthu
I'd love to see this merged. It would improve our adoption of this project.
Please let me know if you plan on publishing regular releases to Maven Central. If not, I will update https://bitbucket.org/cowwoc/google-diff-match-patch/ to do exactly that. I already published the google-code version a few years back but now there are requests to publish your newer releases. Obviously it's better for you to handle this in-house than have someone do this externally.
Let me know...
For those of you who are interested, I just pushed the latest version of this code to Maven Central under the name org.bitbucket.cowwoc:diff-match-patch:1.2
See https://bitbucket.org/cowwoc/google-diff-match-patch/ for the project website.
NOTE: I am not developing this library. I am just releasing the latest code to Maven Central.
Thanks @cowwoc for your efforts! It's great to have this library available as a regular gradle dependency - hopefully this PR moves forward but until then I've reviewed the cowwoc re-package and it appears to be a clean release of this code. Cheers
@NeilFraser we're waiting on your response
The author doesn’t seem interested any longer. I’m using a fork: https://github.com/java-diff-utils/java-diff-utils
@NeilFraser - A bit of a tragedy is happening here! If this Maven PR could only merge, it would mean your labor of love would become instantly more appealing to a bigger swath of the Internet and your code would be more widely used and appreciated. But instead it's about to undergo a fork, splintering the user base and reducing the impact of your mainline efforts here. You've already done some work to review this PR-- can't we bring it back to life?
What's the point of maintaining a java library if you're not going to make it accessible to the majority of java developers?
Sad this is never taken care of ...
@cowwoc Thanks for submitting the code to Maven.
Would you (or anyone else) in this thread interested in the following modification: change the input type from java.lang.String
to java.lang.CharSequence
. It might be helpful when you want to compare two really large strings (say 2GB each) and you want to implement some logic hidden by CharSequence
interface that allows you to offload such a string to a file for example and then use in-memory-mapped file to access it. Support of multi-byte characters in this case would be tricky as it would need an intermediate translation layer between file bytes and characters sequence (I don't have such implementation), but for one-byte charsets like ISO-8859-1 the implementation is rather simple (my case).
If String
is passed instead of CharSequence
there will be some minor performance drop, but I don't expect much.
Let me know if somebody is interested in the patch / code.
For above reason I cannot use java-diff-utils project, as in line 150 provided that String original
has size X, I would expect that List<String> origList
would have at least size X, and Patch<String> patch
might also have size X depending on how many are differences and what JVM policy on String.substring()
(create a lightweight copy or clone part of internal character buffer). So we have so heavy memory pressure...
@dmak did you mean CharSequence
? If not, to what class are you referring? Is there a link?
Support of multi-byte characters in this case would be tricky as it would need an intermediate translation layer between file bytes and characters sequence
diff-match-patch
doesn't currently specify input encoding. to that end, yes, it's complicated because if you feed it input with different code unit lengths then you will get different output and data corruption if you mix the deltas between systems. (for example, you can't apply in Java a delta generated from Python3 if there are surrogate-pairs in the input stream).
but for one-byte charsets like ISO-8859-1 the implementation is rather simple (my case).
one-byte charsets like ISO-8859-1 are also problematic for this library until some specification on index/length units are made (I proposed one such backwards-compatible declaration in #83) for the output deltas. this is more or less to say, I agree with you that introducing a character stream reveals the complexity of representing human text, though technically it doesn't introduce complexity that wasn't already present-yet-hidden.
You're right, I meant CharSequence
– just corrected my post above. And your post lead me to another idea that maybe writing such a layer between byte and character translation would be more difficult rather than first doing the diff on byte level and then mutating (extending / shrinking) a bit the resulting diffs so that they do not eventually stop in the middle of multi-byte sequence so that every range in the result could be converted to a string without a problem.
Anyway, just an idea that passing String
into the algorithm is too limiting as this class cannot be extended / it's behavior cannot be changed, and converting string to an array of characters is too wasteful. So CharSequence
is a good trade off, as it could be String
, StringBuilder
or something else backed up by a file or another storage.