pulsar
pulsar copied to clipboard
[Doc] Broken links in `io-connectors` page for versions prior to 2.5.2
Search before asking
- [X] I searched in the issues and found nothing similar.
What issue do you find in Pulsar docs?
Most of the links on this page appear broken (404 error): https://pulsar.apache.org/docs/2.3.1/io-connectors.
The issues in the historical versions of the io-connectors
page include:
-
404 issue - The doc files linked in
io-connectors.md
are not in place (might be removed mistakenly) for versions between2.1.1-incubating
and2.5.2
. -
Incorrect content - The content of
io-connectors.md
in2.1.1-incubating
,2.2.0
,2.4.0
is incorrectly overwritten by the latest version, and the doc files linked inside are also not in place which causes more 404 issues.
What is your suggestion?
The versions of docs prior to 2.4.2 are out of the maintenance cycle. Does it still make sense to address the issues?
@Anonymitaet @DaveDuggins @D-2-Ed feel free to share your thoughts and findings here.
Actually, due to the website framework changes, we may have more 404 link issues (links without .md
) in historical versions that are not in the maintenance cycle. So the answer also applies to those.
//cc @urfreespace.
Any reference?
No response
Are you willing to submit a PR?
- [x] I'm willing to submit a PR!
- For 404 links, whether they are in the docs that are in the maintenance life cycle or not, it makes sense to fix them since users feel frustrated when getting the 404 pages and might end up on those pages.
- All missing docs, for example, https://pulsar.apache.org/docs/2.3.1/io-connectors/io-aerospike.md (404), are available at https://pulsar.staged.apache.org/docs/en/2.3.1/io-aerospike/, which means these docs can be recovered.
@Anonymitaet thanks for sharing your information. Seems it's the only way to trace back some of the files, requiring manual formatting though. I will try to revert the incorrect versions of connector content and recover the versions of missing files to avoid 404 issues.
@Anonymitaet if we can redirect docs of old versions, said before 2.7.0, to pulsar.staged.apache.org and keep them work as is, it will help to focus on current versions instead of fixing very old versions while they can work well as previously it did.
@tisonkun thanks! I think we do not want users to know pulsar.staged.apache.org.
Reasons:
-
For users: they might be confused about why there are 2 Pulsar websites, we need to spend some effort explaining it.
-
For maintainers:
2.1 we should give users the minimal necessary info, which reduces their cognitive load. What they need to know is https://pulsar.apache.org, that is enough.
2.2 Not sure if we can keep https://pulsar.staged.apache.org/ forever, can we? @urfreespace
@Anonymitaet It's not about the domain pulsar.staged.apache.org
, but the artifacts...
If you take a look at an ancient trino version, you will find that the style changes overtime:
- https://trino.io/docs/current/
- https://trino.io/docs/334
In another word, they don't migrate 334 documents to current style and it does be unnecessary.
Thank you for calling out this issue. Can we identify the types of changes that break older versions? Ideally, we need to be able to preserve the old links while making changes to the current documentation. Because we will not be updating / maintaining content in versions prior to 2.8, we need to come up with a reasonable strategy.
@momo-jun now I know the issue you reported is about low quality of old version documents. There are hard links to the "latest" pages which have been changed over time, or the content is itself incorrect.
For example, one of the broken links you mentioned for 2.3.1 has never been correct since it was created https://github.com/apache/pulsar/pull/4040 - a link from io-connectors.md
to io-aerospike.md
but the latter is missing. (So...how can it be present on https://pulsar.staged.apache.org/docs/en/2.3.1/io-aerospike/?)
@momo-jun You could replace most of the 404 with the below regex
search regex:
(\[[^\]]+\]\((?!http|assets|\.|\/|#)((?!\.md|:|\.|#|\/).)*)\)
replace regex:
$1.md)
search path: site2
VSCode screenshot
@momo-jun /cc @Anonymitaet @tisonkun
@tisonkun I don't have answers to this question. @Anonymitaet @urfreespace any thoughts?
@momo-jun now I know the issue you reported is about low quality of old version documents. There are hard links to the "latest" pages which have been changed over time, or the content is itself incorrect.
For example, one of the broken links you mentioned for 2.3.1 has never been correct since it was created #4040 - a link from
io-connectors.md
toio-aerospike.md
but the latter is missing. (So...how can it be present on https://pulsar.staged.apache.org/docs/en/2.3.1/io-aerospike/?)
Discussed with @urfreespace, that redirecting the historical versions of docs to the archived/staged site has potential risks:
- it's just a snapshot without any maintenance possibilities.
- it also provides 2.8.x/2.9.x/2.10.x docs in the old website framework which will confuse users.
To sum up:
- We do need to evaluate changes that impact links in historical versions of docs to avoid 404 issues.
- The general strategy is that we do not maintain historical versions of docs that are out of the maintenance cycle. ---- We've added a disclaimer on all those versions that says - these docs are no longer actively maintained.
- We need a principle about the necessity of fixes in those historical versions of docs - We apply a hotfix only
if it has a wide range of impact on the accessibility and can be fixed with automated solutions
. ---- I submitted #17763 to fix all the broken links in those historical versions of docs. Thank @urfreespace for providing the hotfix solution.
@D-2-Ed @tisonkun @Anonymitaet feel free to share any thoughts or comments.