diff-match-patch Convert Project to Maven

This PR converts the project to maven with support to be able to publish to Maven Central

Feb 14 '18 17:02 brsanthu

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.
If your company signed a CLA, they designated a Point of Contact who decides which employees are authorized to participate. You may need to contact the Point of Contact for your company and ask to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the project maintainer to go/cla#troubleshoot. The email used to register you as an authorized contributor must be the email used for the Git commit.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again. If the bot doesn't comment, it means it doesn't think anything has changed.

Feb 14 '18 17:02 googlebot

I signed it!

Feb 14 '18 17:02 brsanthu

CLAs look good, thanks!

Feb 14 '18 17:02 googlebot

@NeilFraser I have taken care of comments.

About

Also, are all those sections of pom.xml needed? There seems to be a lot of boilerplate copied from some other project. The smaller the footprint the less likely it is to break.

You are right. I copied most of the template from my another project google-analytics-java. I copied with intention that sometime you would want to push this to library to maven central so that folks can easily consume it.

However, Google may have different way of publishing to central (guava team might know about it). if that is the case, I can remove all that additional sections.

Feb 15 '18 22:02 brsanthu

@NeilFraser I have converted your tests to Junit based tests so that you can just execute them with command mvn test, which will take care to setup classpath etc.

Feb 15 '18 23:02 brsanthu

Converting the tests to JUnit (or any other test framework) will increase maintainability, not decrease. It is easier to spot issues given the fact that Most IDEs are optimized for it. Also the error message would be more readable; assertEqual would print both the expected value and the actual value, useful if we have Continuous Integration enabled in future. It may need discussion about which one to use, but using a test framework is a good idea overall, for tests larger than 10 testcases.

However I myself am not familiar with plugins, and don't use that many plugins. For one, I doubt whether Nexus Staging Maven Plugin are necessary in this pull request.

Feb 20 '18 16:02 FranklinYu

@franklinyu Reason nexus related stuff is there with assumption that Neil would want to push this library to Maven Central so that is can be easily adopted by others.

However since this google, they might have another internal approach they are following for other google libs like guava. If Neil wants to follow internal approach, I will remove all Nexus related stuff.

Feb 20 '18 19:02 brsanthu

Not familiar with the plugin, I assume that it automates pushing to Maven Central. However this repository does not seems to revolve that fast. I would suggest switching to Nexus only when manually pushing to Maven Central becomes a burden over @NeilFraser. Right now I think simply (manually) pushing the current state to Maven would be enough.

Feb 20 '18 22:02 FranklinYu

Was there any development on this issue? I would be extremely advantageous to have the possibility to use this project via Maven Central.

May 10 '18 09:05 daczczcz1

I would like to see an official release of this project on Maven too - I am currently importing an unofficial release (called "current") from a RedHat repository, but this updates dynamically:

https://maven.repository.redhat.com/ga/diff_match_patch/diff_match_patch/current/

I would much rather import an official version with a proper version number, so I was delighted to see this pull request - however, it seems to have stalled.

@brsanthu - I suggest you remove the Nexus stuff from pom.xml as per the suggestion from @franklinyu. You will also need to resolve the conflicts, but then your pull request should be ready to accept. Thanks.

May 23 '18 15:05 RobertStroud

Removed the nexus related config from pom.xml
Removed the conflicts with latest
Integrated latest diff_match_patch_test.java to run as Junit tests

@NeilFraser ready for another review.

May 23 '18 16:05 brsanthu

Whats stopping this from getting merged ?

Jun 14 '18 04:06 mithuns

Please rebase your branch and also I would highly recommend upgrading the java version from 1.6 to atleast 1.8.

Jun 16 '18 19:06 mithuns

About 1.8: If code is not using any 1.8 features, why make it that version as minimum?

About rebasing: Unless Neil confirms that he will review and merge, there is no point in rebasing as he updates the code, it will get out of sync.

Jun 16 '18 22:06 brsanthu

Its not just about if the code does not use any 1.8 features. java 6 has had pretty much end of life for public updates a long time ago. Open source code does not have to keep using a version which will not have public updates anymore.

Jun 16 '18 22:06 mithuns

Also, diff_match_patch.java:[1813,28] Use @Deprecated with the javadoc @deprecated The javadoc @deprecated only marks it as deprecated in the generated javadoc where as the java annotation java.lang.Deprecated lets the compiler know, so any consumer using that method gets to know if they are using a deprecated function/method.

Jun 16 '18 23:06 mithuns

@mithuns I'd understand the merits of 1.8. But it is upto Neil to make that call.

Jun 17 '18 03:06 brsanthu

Ping? @NeilFraser

Jul 17 '18 13:07 tOverney

And? Does the library now show up in maven central which is the objective if I understand right? Is this https://search.maven.org/artifact/fun.mike/diff-match-patch/0.0.2/jar a compatible release?

Sep 07 '18 10:09 WolfgangFahl

@WolfgangFahl How is the group ID fun.mike related to @NeilFraser?

Sep 07 '18 13:09 FranklinYu

@franklinyu you asked: How is the group ID fun.mike related to @NeilFraser? Answer: I have no clue. My question was about the compatibility. I did not find the source repository for that release anywhere but the class file content looks somewhat o.k. I am using that release for a workaround of this issue not being closed yet and an "official" maven release being available.

Sep 09 '18 09:09 WolfgangFahl

Any update on this? I'd be happy to take a look if there is anything else that's needed. @NeilFraser @brsanthu

Sep 25 '18 08:09 peterjohansen

I'd love to see this merged. It would improve our adoption of this project.

Nov 12 '18 18:11 tyehill

Please let me know if you plan on publishing regular releases to Maven Central. If not, I will update https://bitbucket.org/cowwoc/google-diff-match-patch/ to do exactly that. I already published the google-code version a few years back but now there are requests to publish your newer releases. Obviously it's better for you to handle this in-house than have someone do this externally.

Let me know...

Nov 24 '18 13:11 cowwoc

For those of you who are interested, I just pushed the latest version of this code to Maven Central under the name org.bitbucket.cowwoc:diff-match-patch:1.2

See https://bitbucket.org/cowwoc/google-diff-match-patch/ for the project website.

NOTE: I am not developing this library. I am just releasing the latest code to Maven Central.

Nov 25 '18 03:11 cowwoc

Thanks @cowwoc for your efforts! It's great to have this library available as a regular gradle dependency - hopefully this PR moves forward but until then I've reviewed the cowwoc re-package and it appears to be a clean release of this code. Cheers

Nov 25 '18 03:11 mikehardy

@NeilFraser we're waiting on your response

Mar 23 '19 21:03 cowwoc

The author doesn’t seem interested any longer. I’m using a fork: https://github.com/java-diff-utils/java-diff-utils

Apr 30 '19 21:04 FranklinYu

@NeilFraser - A bit of a tragedy is happening here! If this Maven PR could only merge, it would mean your labor of love would become instantly more appealing to a bigger swath of the Internet and your code would be more widely used and appreciated. But instead it's about to undergo a fork, splintering the user base and reducing the impact of your mainline efforts here. You've already done some work to review this PR-- can't we bring it back to life?

May 01 '19 11:05 dave-lum

What's the point of maintaining a java library if you're not going to make it accessible to the majority of java developers?

Jun 22 '20 21:06 dnut

Sad this is never taken care of ...

Oct 04 '20 03:10 Ortega-Dan

@cowwoc Thanks for submitting the code to Maven.

Would you (or anyone else) in this thread interested in the following modification: change the input type from java.lang.String to java.lang.CharSequence. It might be helpful when you want to compare two really large strings (say 2GB each) and you want to implement some logic hidden by CharSequence interface that allows you to offload such a string to a file for example and then use in-memory-mapped file to access it. Support of multi-byte characters in this case would be tricky as it would need an intermediate translation layer between file bytes and characters sequence (I don't have such implementation), but for one-byte charsets like ISO-8859-1 the implementation is rather simple (my case).

If String is passed instead of CharSequence there will be some minor performance drop, but I don't expect much.

Let me know if somebody is interested in the patch / code.

For above reason I cannot use java-diff-utils project, as in line 150 provided that String original has size X, I would expect that List<String> origList would have at least size X, and Patch<String> patch might also have size X depending on how many are differences and what JVM policy on String.substring() (create a lightweight copy or clone part of internal character buffer). So we have so heavy memory pressure...

May 21 '21 20:05 dmak

@dmak did you mean CharSequence? If not, to what class are you referring? Is there a link?

Support of multi-byte characters in this case would be tricky as it would need an intermediate translation layer between file bytes and characters sequence

diff-match-patch doesn't currently specify input encoding. to that end, yes, it's complicated because if you feed it input with different code unit lengths then you will get different output and data corruption if you mix the deltas between systems. (for example, you can't apply in Java a delta generated from Python3 if there are surrogate-pairs in the input stream).

but for one-byte charsets like ISO-8859-1 the implementation is rather simple (my case).

one-byte charsets like ISO-8859-1 are also problematic for this library until some specification on index/length units are made (I proposed one such backwards-compatible declaration in #83) for the output deltas. this is more or less to say, I agree with you that introducing a character stream reveals the complexity of representing human text, though technically it doesn't introduce complexity that wasn't already present-yet-hidden.

May 21 '21 20:05 dmsnell

You're right, I meant CharSequence – just corrected my post above. And your post lead me to another idea that maybe writing such a layer between byte and character translation would be more difficult rather than first doing the diff on byte level and then mutating (extending / shrinking) a bit the resulting diffs so that they do not eventually stop in the middle of multi-byte sequence so that every range in the result could be converted to a string without a problem.

Anyway, just an idea that passing String into the algorithm is too limiting as this class cannot be extended / it's behavior cannot be changed, and converting string to an array of characters is too wasteful. So CharSequence is a good trade off, as it could be String, StringBuilder or something else backed up by a file or another storage.

May 22 '21 12:05 dmak