Cache docs with service worker
Hi,
Today I noticed that hackage.haskell.org was down for quiet awhile.
I wasn't able to use doc packages opened by vscode+HLS.
I propose to enable service worker browser caching. This way docs would be available offline after user visit package docs once.
#!/bin/sh
DOC_ROOT=
CACHEABLE_EXTS='[.](js|css|json|png|gif|html)$'
HTML_EXT='[.]html$'
err() {
echo "Error: $@" 1>&2
exit 1
}
while [ $# -ne 0 ] ; do
case "$1" in
-h|--help)
cat<<EOF
Usage:sh -r <root path> [ <OPTIONS> ]
Adds service-worker for caching all HTML pages in the folder.
Options:
--help -h
--doc-root -r <NAME> haddoc root
EOF
exit 1
;;
-r|--doc-root) shift; DOC_ROOT="$1";;
*) err "Bad option [$1]" ;;
esac
shift
done
[ -n "$DOC_ROOT" ] || err "document root is not set"
[ -d "$DOC_ROOT" ] || err "document root doesnt not exist: [$DOC_ROOT]"
# generate service worker
cat<<EOF > $DOC_ROOT/offline-service-worker.js
self.addEventListener("fetch", (event) => {
console.log("Service Worker Fetch event " + event.request.url);
event.respondWith(
caches.open("cache1").then(
cache => cache.match(event.request, {ignoreSearch: true}).then(
response => {
if (response) {
console.log("Found in cache: " + event.request.url);
return response;
} else {
console.log("Network request: " + event.request.url);
return fetch(event.request).then(
ok => ok,
(e) => {
console.log("Offline fallback with 404", e);
return cache.match("/404.html");
});
}
}))
);
});
self.addEventListener('install', (event) => {
console.log("Start install worker " + new Date());
event.waitUntil(
caches.delete("cache1")
.then(ok => {
console.log("Previous cache is cleaned: " + ok);
caches.open("cache1").then(cache => {
return cache.addAll(
[
EOF
ls -1 $DOC_ROOT | grep -E "$CACHEABLE_EXTS" | while read FILE_PATH ; do
echo "'$(basename $FILE_PATH)',"
done >> $DOC_ROOT/offline-service-worker.js
# generate manifest
cat<<EOF > $DOC_ROOT/manifest.json
{
"short_name": "MyPackage-Haddocks",
"name": "My Package Haddocks",
"description": "My Package Haddocks",
"version": "0.1",
"display": "standalone",
"orientation": "landscape",
"icons": [],
"scope": ".",
"start_url": "./"
}
EOF
cat<<EOF >> $DOC_ROOT/offline-service-worker.js
'manifest.json'
]);});}));});
EOF
cat<<EOF >swReg.html
<script type="text/javascript">
window.addEventListener("load", () => {
navigator.serviceWorker.register('offline-service-worker.js')
.then(
(registration) =>
console.log('ServiceWorker registration successful with scope: ',
registration.scope)
).catch(
(err) =>
console.log('ServiceWorker registration failed: ', err)
);
});
</script>
EOF
ls -1 $DOC_ROOT | grep -E "$HTML_EXT" | while read HTML_FILE ; do
HTML_FILE=$DOC_ROOT/$HTML_FILE
perl -pi -e 's/(<head>)/\n$1\n/' $HTML_FILE
sed -i.b '/<head>/ r swReg.html' $HTML_FILE
rm $HTML_FILE.b
done
rm swReg.html
What is wrong with CDN caching? Is it not good enough? I hope it works.
E.g.
% curl -D - https://hackage.haskell.org/package/base-4.16.0.0/docs/Control-Concurrent-QSemN.html|less
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 5972
Server: nginx/1.18.0 (Ubuntu)
Content-Type: text/html; charset=utf-8
Cache-Control: public, max-age=86400
ETag: "f660245f1a3d53b216f7805264349d7e"
Says "cache me for a day". Maybe that could be longer for (older) docs though.
I think that caching like this should be integrated in downstream tooling rather thank hackage-server itself.
What is wrong with CDN caching? Is it not good enough? I hope it works.
E.g.
% curl -D - https://hackage.haskell.org/package/base-4.16.0.0/docs/Control-Concurrent-QSemN.html|less % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0HTTP/1.1 200 OK Connection: keep-alive Content-Length: 5972 Server: nginx/1.18.0 (Ubuntu) Content-Type: text/html; charset=utf-8 Cache-Control: public, max-age=86400 ETag: "f660245f1a3d53b216f7805264349d7e"Says "cache me for a day". Maybe that could be longer for (older) docs though.
Service worker cache is better than server side cache, because it is working offline when you are on a flight e.g. and it is faster. Page can be safely closed and opened in a few days in complete offline. Today's hackage outage lasted for 15 minutes for me. I was observing 503 and had time for a dinner.