hackage-server icon indicating copy to clipboard operation
hackage-server copied to clipboard

Cache docs with service worker

Open yaitskov opened this issue 2 years ago • 3 comments

Hi,

Today I noticed that hackage.haskell.org was down for quiet awhile.

I wasn't able to use doc packages opened by vscode+HLS.

I propose to enable service worker browser caching. This way docs would be available offline after user visit package docs once.

#!/bin/sh

DOC_ROOT=
CACHEABLE_EXTS='[.](js|css|json|png|gif|html)$'
HTML_EXT='[.]html$'

err() {
    echo "Error: $@" 1>&2
    exit 1
}

while [ $# -ne 0 ] ; do
    case "$1" in
        -h|--help)
            cat<<EOF
Usage:sh -r <root path> [ <OPTIONS> ]
  Adds service-worker for caching all HTML pages in the folder.

  Options:
    --help -h
    --doc-root -r <NAME>   haddoc root
EOF
            exit 1
            ;;
        -r|--doc-root) shift; DOC_ROOT="$1";;
        *) err "Bad option [$1]" ;;
    esac
    shift
done

[ -n "$DOC_ROOT" ] || err "document root is not set"
[ -d "$DOC_ROOT" ] || err "document root doesnt not exist: [$DOC_ROOT]"


# generate service worker
cat<<EOF > $DOC_ROOT/offline-service-worker.js
self.addEventListener("fetch", (event) => {
  console.log("Service Worker Fetch event " + event.request.url);
  event.respondWith(
    caches.open("cache1").then(
      cache => cache.match(event.request, {ignoreSearch: true}).then(
        response => {
          if (response) {
            console.log("Found in cache: " + event.request.url);
            return response;
          } else {
            console.log("Network request: " + event.request.url);
            return fetch(event.request).then(
              ok => ok,
              (e) => {
                console.log("Offline fallback with 404", e);
                return cache.match("/404.html");
              });
          }
        }))
    );
});

self.addEventListener('install', (event) => {
  console.log("Start install worker " + new Date());
  event.waitUntil(
    caches.delete("cache1")
      .then(ok => {
        console.log("Previous cache is cleaned: " + ok);
    caches.open("cache1").then(cache => {
      return cache.addAll(
        [
EOF

ls -1 $DOC_ROOT | grep -E "$CACHEABLE_EXTS" | while read FILE_PATH ; do
    echo "'$(basename $FILE_PATH)',"
done >> $DOC_ROOT/offline-service-worker.js

# generate manifest
cat<<EOF > $DOC_ROOT/manifest.json
{
  "short_name": "MyPackage-Haddocks",
  "name": "My Package Haddocks",
  "description": "My Package Haddocks",
  "version": "0.1",
  "display": "standalone",
  "orientation": "landscape",
  "icons": [],
  "scope": ".",
  "start_url": "./"
}
EOF

cat<<EOF >> $DOC_ROOT/offline-service-worker.js
  'manifest.json'
]);});}));});
EOF

cat<<EOF >swReg.html
  <script type="text/javascript">
    window.addEventListener("load", () => {
      navigator.serviceWorker.register('offline-service-worker.js')
        .then(
          (registration) =>
            console.log('ServiceWorker registration successful with scope: ',
                        registration.scope)
        ).catch(
          (err) =>
            console.log('ServiceWorker registration failed: ', err)
        );
    });
  </script>
EOF

ls -1 $DOC_ROOT | grep -E "$HTML_EXT" | while read HTML_FILE ; do
    HTML_FILE=$DOC_ROOT/$HTML_FILE
    perl -pi -e 's/(<head>)/\n$1\n/' $HTML_FILE
    sed -i.b '/<head>/ r swReg.html' $HTML_FILE
    rm $HTML_FILE.b
done

rm swReg.html

yaitskov avatar Nov 12 '21 22:11 yaitskov

What is wrong with CDN caching? Is it not good enough? I hope it works.

E.g.

% curl -D - https://hackage.haskell.org/package/base-4.16.0.0/docs/Control-Concurrent-QSemN.html|less       
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 5972
Server: nginx/1.18.0 (Ubuntu)
Content-Type: text/html; charset=utf-8
Cache-Control: public, max-age=86400
ETag: "f660245f1a3d53b216f7805264349d7e"

Says "cache me for a day". Maybe that could be longer for (older) docs though.

phadej avatar Nov 12 '21 22:11 phadej

I think that caching like this should be integrated in downstream tooling rather thank hackage-server itself.

gbaz avatar Nov 12 '21 23:11 gbaz

What is wrong with CDN caching? Is it not good enough? I hope it works.

E.g.

% curl -D - https://hackage.haskell.org/package/base-4.16.0.0/docs/Control-Concurrent-QSemN.html|less       
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 5972
Server: nginx/1.18.0 (Ubuntu)
Content-Type: text/html; charset=utf-8
Cache-Control: public, max-age=86400
ETag: "f660245f1a3d53b216f7805264349d7e"

Says "cache me for a day". Maybe that could be longer for (older) docs though.

Service worker cache is better than server side cache, because it is working offline when you are on a flight e.g. and it is faster. Page can be safely closed and opened in a few days in complete offline. Today's hackage outage lasted for 15 minutes for me. I was observing 503 and had time for a dinner.

yaitskov avatar Nov 13 '21 00:11 yaitskov