jackson-databind icon indicating copy to clipboard operation
jackson-databind copied to clipboard

Use external hosting for javadocs (link from README/wiki) to reduce Git repo size

Open drekbour opened this issue 2 years ago • 4 comments

I can't figure out why the javadocs are saved into the source-tree. I really can't understand why the full record of historic javadocs is stored there. It is about 60x the size of the /src tree!

I presume this is so GH hosts these but isn't there a better way?

$ du -s .[^.]* * | sort -nr
653988	docs
110124	.git
11828	src
164	release-notes
76	.mvn
44	.github
32	attic

drekbour avatar Apr 02 '22 19:04 drekbour

Yes, Javadocs are hosted for new minor releases, to be linked to from project Wiki.

I am ears for a better system, if this is an actual problem? (disk space is not super expensive these days)

cowtowncoder avatar Apr 03 '22 04:04 cowtowncoder

No maintainer experience to offer but I've used readthedocs.io many times

I'm betting you haven't seen this https://www.javadoc.io/doc/com.fasterxml.jackson.core/jackson-databind/latest/index.html

javadoc hosting for open source projects hosted on Central Maven free, CDN enabled, new versions auto-detected within 24 hours Supports Java, Scala, Groovy... any language thats generates a -javadoc.jar

drekbour avatar Apr 03 '22 09:04 drekbour

Ok, I think we might already have a link from README to external Javadocs; those that are based on Javadoc Maven bundles from Maven Central. So that could significantly simplify changes.

But I think one thing that would allow dropping addition of new javadocs (existing one probably need to be kept at least for a while in case someone is linking to them?) would just be changing of links from Wikis to external Javadoc providers for specific version.

I think this would be a great "new contributor" task to check.

cowtowncoder avatar Apr 04 '22 17:04 cowtowncoder

kept at least for a while in case someone is linking to them

Unsure who would be linking to GH javadocs but I wouldn't encourage it by doing anything other than purging them. No one will thank you for keeping their Medium article hotlinked (or ever change those links).

My own experience is that, with modern IDEs fully automating -sources and -javadocs download, the only time I use externally hosted docs for anything is answering SO questions :)

drekbour avatar Apr 04 '22 18:04 drekbour

To keep this one vaguely moving - would you be against me going through each FasterXML/* repo: deleting anything generated in ./docs (and the config that persists them there) then replacing with a single to the hosting. As before the existing published artifacts are sufficient for javadoc to be auto-hosted here with no further effort: https://www.javadoc.io/doc/com.fasterxml.jackson.module

drekbour avatar Jan 19 '23 20:01 drekbour

Yes; if you could first replace links on Wiki:

https://github.com/FasterXML/jackson-databind/wiki

that'd be a good step (I think you have access, if not LMK).

And I guess simple redirecting docs for docs/javadoc/*/index.html would be the other part. With that I'd be happy & same could be done for other repos too. Plus I'd maintain Wiki going forward.

Help much appreciaed @drekbour !

cowtowncoder avatar Jan 20 '23 19:01 cowtowncoder

:+1: Updated wiki for jackson-core, jackson-databind :-1: Don't have wiki access to jackson-datatype-jdk8, jackson-dataformats-text, jackson-dataformats-binary (and probably any other module/datatype/dataformat etc) to update those.

I note that, because javadoc.io has a drop down covering all versions, the new links are generic and need no maintenance. This leads me to think they could be added to the README.md?

drekbour avatar Jan 26 '23 12:01 drekbour

Thank you @drekbour! README already actually has the Javadocs badge under "Status". I'll see what is needed for other repos: perhaps it requires being contributor (having had a PR merged)?

How about annotations' wiki? Ah already done too, great! :)

cowtowncoder avatar Jan 26 '23 17:01 cowtowncoder

Removed docs/javadoc from 2.14 branch onward for:

  • jackson-annotations
  • jackson-core
  • jackson-databind

cowtowncoder avatar Jan 26 '23 18:01 cowtowncoder

@drekbour I changed access settings so you should be able to change wikis for:

  • jackson-modules-base
  • jackson-modules-java8
  • jackson-dataformats-binary
  • jackson-dataformats-text
  • jackson-dataformat-xml

LMK which other ones you'd want to target.

cowtowncoder avatar Jan 27 '23 23:01 cowtowncoder

FYI: these changes break the builds of some downstream projects.

For example, the Spring Framework 6.0.x build was broken by this.

I have not investigated which further builds are broken, but I imagine there could be many.

The reason these changes break builds is that some projects configure Jackson for external Javadoc links. For example, in the Spring Framework builds we were configuring the following external links.

  • https://fasterxml.github.io/jackson-core/javadoc/2.10/
  • https://fasterxml.github.io/jackson-databind/javadoc/2.10/
  • https://fasterxml.github.io/jackson-dataformat-xml/javadoc/2.10/

When the javadoc task in our build executed, it failed to retrieve the package-list files with errors similar to the following.

error: Error fetching URL: https://fasterxml.github.io/jackson-core/javadoc/2.10/ (java.io.FileNotFoundException: https://fasterxml.github.io/jackson-core/javadoc/2.10/package-list)

Navigating up that directory structure led me to https://fasterxml.github.io/jackson-core/ which states:

jackson-core

Empty!

/docs/ used to contain Javadocs definitions, but since they can be found from:

http://www.javadoc.io/doc/com.fasterxml.jackson.core/jackson-core

are no longer stored in this repo

That's how I eventually found this GitHub issue.

To fix Spring's builds, I got things working again by using http://www.javadoc.io.

However, as a benefit to the Jackson community it would be great if you could introduce redirects from URLs such as https://fasterxml.github.io/jackson-core/javadoc/2.10/package-list to https://www.javadoc.io/doc/com.fasterxml.jackson.core/jackson-core/2.10.0/package-list.

FWIW, I noticed that you mentioned adding redirects for index.html files in https://github.com/FasterXML/jackson-databind/issues/3440#issuecomment-1398822116. So perhaps the package-list files were just an oversight. 😉

sbrannen avatar Jan 29 '23 10:01 sbrannen

Thank you for bringing this to our attentiont @sbrannen .

Ugh. My intention was not break things in this way, and in hindsight I should have asked about possible downside on user/dev mailing list.

I would need help in figuring out a good way to resolve things here: undoing removals is a possibility, which would mean doing something like:

  1. Retaining javadocs for specific subset of versions
  2. Stopping publishing of javadocs after 2.14 (publishing is actually manual operation after Maven Release plugin)

But if redirect works, that's better, I assume this:

https://blog.hubspot.com/website/html-redirect

would do the trick?

Although not sure about redirecting index.html vs package-list.

cowtowncoder avatar Jan 30 '23 00:01 cowtowncoder

I'm not sure if html redirects using meta tags supports wildcards.

One approach that does are .htaccess files. Reasonable write-up: https://www.seoptimer.com/blog/wildcard-redirect/

pjfanning avatar Jan 30 '23 00:01 pjfanning

@pjfanning That sounds like a good solution where feasible Not sure it is doable here since we rely on Github pages/in-repo docs reference. But then again we can't be the first project to hit issues like this...

cowtowncoder avatar Jan 30 '23 01:01 cowtowncoder

This also broke our pipelines and I imagine lots of other people's pipelines. The older URL no longer exists even for older versions. It would have make more sense to keep the URL around for some time.

joca-bt avatar Jan 30 '23 09:01 joca-bt

@joca-bt broken pipelines are esy to fix - eg https://github.com/spring-projects/spring-framework/commit/40d246633432a950f215d134705c212a4b9ef0dc

We may put back the old pages but first, we want to see if we can use redirects instead.

Unfortunately, if you need an immediate fix, then you are stuck with changing the URL in your own build files like the spring commit above.

pjfanning avatar Jan 30 '23 09:01 pjfanning

But given there was no announcement anywhere about this, you are making user's life more difficult for literally no reason. The old website is just broken, as it shows 404. At least we could add a banner saying "The javadoc has moved to ..." so users don't need to guess.

joca-bt avatar Jan 30 '23 10:01 joca-bt

It's not my repo. I didn't make the change.

The change wasn't made to ruin your day. There is an announcement on https://fasterxml.github.io/jackson-databind/

It just wasn't considered that users link to the subpages in their builds. But to reiterate, the javadocs are accessible at https://www.javadoc.io/doc/com.fasterxml.jackson.core/jackson-databind

pjfanning avatar Jan 30 '23 10:01 pjfanning

@joca-bt This was not intentional: I was not sure there was actual usage. But yes, this definitely could have gone better. At this point I am looking for best solutions to achieve what we want (reduced size of repo download, less maintenance; but also trying to avoid breakage by existing users).

PR would be good, or proposal for revert, followed by selective removal: for example, retain versions 2.10.x and later, do not publish version 2.15 and later.

cowtowncoder avatar Jan 30 '23 22:01 cowtowncoder

Reading through earlier comments, it sounds like package-list is useful/necessary; I can return these I think, as the first step.

Second, assuming index.html of the main level would be useful I can probably add redirects for those as well. These are needed for jackson-core, jackson-annotations and jackson-databind.

cowtowncoder avatar Jan 31 '23 03:01 cowtowncoder

@sbrannen As per #3769, I added docs/javadoc/2.14/package-list (etc) back in this, jackson-core and jackson-annotations repositories (only ones were removal was done). Does this help on its own, or would more be needed for specific build systems in question?

cowtowncoder avatar Jan 31 '23 03:01 cowtowncoder

Hi @cowtowncoder,

Thanks for investigating ways to alleviate the issues.

When I originally reported the issue about broken builds, I was only focusing on getting builds to work going forward; however, that is only part of the overall set of issues.

It's actually considerably more involved. Let me see if I can highlight the issues I'm aware of.

Restoring package-list files will get builds (that cross reference Jackson javadoc) to pass, but it will result in broken links within the generated Javadoc.

For example, if I revert those changes I made to Spring's build (i.e., switch back to using external Javadoc links like https://fasterxml.github.io/jackson-core/javadoc/2.10/), then the Spring api Gradle task will succeed, and the build will technically pass. However, if I view the generated HTML and click on a link to a cross-referenced Jackson type, I'll encounter a 404 error.

Concrete examples:

  • Spring's published 6.0.4 Javadoc behaves the same as my locally reverted changes that find the package-list file you restored in commit 17762595577afae81f38956783a2bffc020dbc62. The reason is that the javadoc tool uses the contents of the package-list file combined with the URL used to download the package-list file to generate links (based on convention -- without checking for the existence of any such linked page on the Internet) to external types referenced in Javadoc tags (@see, etc.). So Spring's generated Javadoc HTML now references Jackson Javadoc pages that don't exist. To see this in action, click on the 6.0.4 link above and then click on any link to ObjectMapper.
  • In a similar vein, all existing published Spring Javadoc versions now contain broken links to Jackson types. That's true for 6.0.4, 5.3.25, 5.0.0.RELEASE, etc., etc.

The latter bullet point is applicable not only to the published Javadoc of numerous (thousands?) of libraries/projects around the world, but it also applies to any published blog, tweet, Stack Overflow answer, etc. in which somebody included a link to a specific version of Jackson's Javadoc.

Just to be clear, Spring's 6.0.5-SNAPSHOT Javadoc does not have broken links to Jackson because the package-list and cross-referenced documentation both exist on the same https://www.javadoc.io/doc/com.fasterxml.jackson.core/jackson-databind/2.14.1/ web site.

I hope that clarifies the scope of the issues, but to summarize:

  • Completely removing all Javadoc content from https://fasterxml.github.io breaks builds and breaks all existing links to specific Javadoc pages in Jackson libraries.
  • Restoring only package-list files will allow builds to "pass", but links will be broken. So it's not really a solution.

I see the following as possible options.

  1. Restore all Javadoc content at https://fasterxml.github.io
  2. For Jackson 2.15 onward, do not publish Javadoc to https://fasterxml.github.io but do inform users that Javadoc can be found at https://www.javadoc.io
  3. Point # 1 serves as a quick fix for all known issues, but if you still wish to remove old content you could set up pattern-based redirects for all existing content and then remove the old content once you've verified that the redirects cover all known use cases.

Looking forward to hearing what you decide.

Cheers,

Sam

sbrannen avatar Feb 01 '23 15:02 sbrannen

3 is great but so far no solution that works has been found.

It might be possible to do 1 but change the branch that is used - GitHub pages config would need to be changed to use the separate branch. This approach keeps the master and 2.x branches small.

pjfanning avatar Feb 01 '23 15:02 pjfanning

3 is great but so far no solution that works has been found.

If you believe this answer to be the authoritative answer, then the answer is that it is (currently) impossible to configure a 301 redirect with GitHub Pages.

However, it does appear to be possible by setting up CNAME redirection for a custom domain, but I'm not sure if that's feasible/appropriate/possible for Jackson.

sbrannen avatar Feb 01 '23 15:02 sbrannen

@cowtowncoder if it's ok with you, I can create a ghpages branch based off 2.14 branch prior to the recent docs for changes. See my previous comment. Changing the GitHub repo settings to use the ghpages branch is a simple change in Settings tab.

pjfanning avatar Feb 01 '23 17:02 pjfanning

Hmmh. It's too bad I moved out of gh-pages earlier, but I am not opposed to going back there I suppose.

So +1 for that @pjfanning .

cowtowncoder avatar Feb 01 '23 18:02 cowtowncoder

Thank you @sbrannen for a very thorough explanation why the "quick patch" won't be enough.

I think @pjfanning's idea of going back to gh-pages makes sense, so let's plan on doing that. I think copies of docs/javadoc can be found from tag jackson-databind-2.14.1 (etc).

And we can still leave project Wiki links pointing to javadoc.io, stop publishing new Javadocs with 2.15. Or, if we want to give grace period, still do 2.15 and stop right after; it's not a big deal to do that (but it is one more step in release process that is nice to get rid of).

cowtowncoder avatar Feb 01 '23 19:02 cowtowncoder

I've overwritten gh-pages for jackson-core and jackson-databind so the javadocs for those 2 projects are back.

  • https://fasterxml.github.io/jackson-core/javadoc/2.14/
  • https://fasterxml.github.io/jackson-databind/javadoc/2.14/

also now:

  • https://fasterxml.github.io/jackson-annotations/javadoc/2.14/

pjfanning avatar Feb 01 '23 20:02 pjfanning

Excellent @pjfanning, thank you for doing this.

cowtowncoder avatar Feb 01 '23 23:02 cowtowncoder

I am ears for a better system, if this is an actual problem? (disk space is not super expensive these days)

It makes "global search" harder when navigating the project in IDE. It makes clone/update slower.

WDYT of having a separate repository (e.g. fasterxml-javadoc.github.io/...) that would host the generated javadocs?

vlsi avatar Nov 28 '23 06:11 vlsi