SwiftPackageIndex-Server
SwiftPackageIndex-Server copied to clipboard
Add documentation pages to the Sitemap
They're not included in the SiteURL so it didn't happen automatically. Also we should look into whether DocC outputs anything we can use to grab all of the documentation page paths, not just the /[owner]/[package]/[ref]/documentation page.
I wonder if it would be advantageous to support nested sitemaps?
https://developers.google.com/search/docs/advanced/sitemaps/large-sitemaps
This is where we would generate a OWNER/PACKAGE/sitemap.xml which includes all the pages for that package.
The top level sitemap then references all of those individual sitemaps instead of itself containing every single line.
It may offer more flexibility and scalability as we add more pages under the package route.
It's possible yes. Certainly if we're able to get every documentation path then this makes sense. If we're only able to add documentation to the end, I'd say not. The sitemap right now is < 500KB which isn't huge at all.
Looks like sitemaps can be up to 50MB (uncompressed) and have 50,000 entries. I'd say we're fine for now.
Imagine a world where...
On a server, after generating a set of statically transformed DocC documentation...
Where the jq command had been installed...
We used our ability to execute a command before uploading build results to execute this command...
jq "{ title: .metadata.title, path: .variants[0].paths[0] }" data/**/*.json | jq "[inputs]"
… and received this as the output…
[
{
"title": "SemanticVersion",
"path": "/documentation/semanticversion/semanticversion"
},
{
"title": "!=(_:_:)",
"path": "/documentation/semanticversion/semanticversion/!=(_:_:)"
},
{
"title": "...(_:)",
"path": "/documentation/semanticversion/semanticversion/'...(_:)-40b95"
},
{
"title": "...(_:)",
"path": "/documentation/semanticversion/semanticversion/'...(_:)-bfr8"
},
{
"title": "...(_:_:)",
"path": "/documentation/semanticversion/semanticversion/'...(_:_:)"
},
{
"title": "..<(_:)",
"path": "/documentation/semanticversion/semanticversion/'.._(_:)"
},
{
"title": "..<(_:_:)",
"path": "/documentation/semanticversion/semanticversion/'.._(_:_:)"
},
{
"title": " Implementations",
"path": "/documentation/semanticversion/semanticversion/-implementations"
},
{
"title": "<(_:_:)",
"path": "/documentation/semanticversion/semanticversion/_(_:_:)-1ojsm"
},
{
"title": ">(_:_:)",
"path": "/documentation/semanticversion/semanticversion/_(_:_:)-4ftn7"
},
{
"title": ">=(_:_:)",
"path": "/documentation/semanticversion/semanticversion/_=(_:_:)-3q5ap"
},
{
"title": "<=(_:_:)",
"path": "/documentation/semanticversion/semanticversion/_=(_:_:)-9elz8"
},
{
"title": "build",
"path": "/documentation/semanticversion/semanticversion/build"
},
{
"title": "Comparable Implementations",
"path": "/documentation/semanticversion/semanticversion/comparable-implementations"
},
{
"title": "CustomStringConvertible Implementations",
"path": "/documentation/semanticversion/semanticversion/customstringconvertible-implementations"
},
{
"title": "description",
"path": "/documentation/semanticversion/semanticversion/description"
},
{
"title": "Equatable Implementations",
"path": "/documentation/semanticversion/semanticversion/equatable-implementations"
},
{
"title": "init(_:)",
"path": "/documentation/semanticversion/semanticversion/init(_:)"
},
{
"title": "init(_:_:_:_:_:)",
"path": "/documentation/semanticversion/semanticversion/init(_:_:_:_:_:)"
},
{
"title": "init(from:)",
"path": "/documentation/semanticversion/semanticversion/init(from:)"
},
{
"title": "isInitialRelease",
"path": "/documentation/semanticversion/semanticversion/isinitialrelease"
},
{
"title": "isMajorRelease",
"path": "/documentation/semanticversion/semanticversion/ismajorrelease"
},
{
"title": "isMinorRelease",
"path": "/documentation/semanticversion/semanticversion/isminorrelease"
},
{
"title": "isPatchRelease",
"path": "/documentation/semanticversion/semanticversion/ispatchrelease"
},
{
"title": "isPreRelease",
"path": "/documentation/semanticversion/semanticversion/isprerelease"
},
{
"title": "isStable",
"path": "/documentation/semanticversion/semanticversion/isstable"
},
{
"title": "LosslessStringConvertible Implementations",
"path": "/documentation/semanticversion/semanticversion/losslessstringconvertible-implementations"
},
{
"title": "major",
"path": "/documentation/semanticversion/semanticversion/major"
},
{
"title": "minor",
"path": "/documentation/semanticversion/semanticversion/minor"
},
{
"title": "patch",
"path": "/documentation/semanticversion/semanticversion/patch"
},
{
"title": "preRelease",
"path": "/documentation/semanticversion/semanticversion/prerelease"
}
]
Wouldn't that be a wonderful world? 😂
If we send back that JSON, we could use it to generate complete sitemaps for all documentation that we host.
It can also spit it out minified with
jq "{ title: .metadata.title, path: .variants[0].paths[0] }" data/**/*.json | jq -c "[inputs]"
We could also generate the sitemap XML as part of the builder, upload the sitemap to S3 along with the documentation archives, and serve the sitemaps through our docc-proxy mechanism.
Of course, since we're running the builder the jq wouldn't even be necessary as we're in Swift at that point.
Possibly of interest. I tack on the option --emit-digest to get the Docc conversion process to dump out a nicely constrained list, matching exactly what I think you're after: docs/linkable-entities.json
I crunch that down (with that lovely jq) into a list of all the entities:
cat docs/linkable-entities.json | jq '.[].referenceURL' -r > all_identifiers.txt
(And my process has a second step where I grep-sed-rage across the results of that to transform them from identifiers into symbol names)
If you grab the .[].path from linkable-entities.json, I think you'll be in fine fettle for what you're after...
(and the -r option helpfully strips the JSON quotes from around the values)
Thanks Joe! It's funny. Before I started looking at just using the directory/file structure, I was sure I had read about an option that could output a set of doc paths. I tried looking for an option on DocC to do this but didn't find --emit-digest. It's not in the DocC documentation that I could find.
Now you remind me of its name, a quick Google later reminds me where I read about it...
https://rhonabwy.com/2022/02/10/tips-for-getting-the-most-out-of-docc/
😂
I learned it from Ethan on the Swift Forums when I was whining about not having a means to see the list of all possible symbols in a package. (I still like to dump all the symbols into a single doc and them sort them out as a curation process)
I'm going to move this back to being a discussion for now. We may look at a more comprehensive site map in the future, but the documentation links are crawlable from package pages, and we don't list any other package sub-pages right now.
The really interesting one that would give a significant advantage would be to use the emit-digest to make a full documentation sitemap, as we discussed above, but it's not something we're actively looking to tackle right now.