hackage-security icon indicating copy to clipboard operation
hackage-security copied to clipboard

RFC: Protocol for bootstrapping when primary Hackage instance is unreachable

Open hvr opened this issue 9 years ago • 1 comments

I wasn't sure whether to file this here or in Cabal's issue tracker, but I think this method can be generalised, so I'm documenting it here for now:

cabal-install can't bootstrap automatically currently when hackage.haskell.org is unreachable (either because it's down or because of routing/firewalling issues), even though one if its mirrors may be reachable without issue.

To this end, I propose the following simple best-effort fallback scheme:

When bootstrapping hackage-security, and the configured repository url ${URL} (e.g. hackage.haskell.org) is not reachable, a DNS TXT lookup on _mirrors.${URL} shall be attempted looking for RFC1464-compliant entries of mirror urls with the keys ${IDX}.urlbase (where ${IDX} is a non-negative integer), and attempt to bootstrap from each of those mirrors urls (in the order of their ${IDX} value) until one succeeds (and giving up when all urls have been tried).

For implementing a prototype, I've created such a DNS RR:

$ dig _mirrors.hackage.haskell.org TXT

; <<>> DiG 9.10.3-P4-Ubuntu <<>> _mirrors.hackage.haskell.org TXT
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62373
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1500
;; QUESTION SECTION:
;_mirrors.hackage.haskell.org.  IN  TXT

;; ANSWER SECTION:
_mirrors.hackage.haskell.org. 300 IN    TXT "0.urlbase=http://hackage.fpcomplete.com/" "1.urlbase=http://objects-us-west-1.dream.io/hackage-mirror/"

;; Query time: 2 msec
;; SERVER: 69.20.0.164#53(69.20.0.164)
;; WHEN: Thu Oct 06 16:32:18 UTC 2016
;; MSG SIZE  rcvd: 170

Moreover, I've created a simple parser for parsing nslookup's output (which appears to be the common denominator tool which is provided by default on Windows, OSX, IBM AIX, and Linux systems) which I've tested on the platforms I had access to:

#! /usr/bin/env runghc

import Data.List
import Data.Char
import Control.Monad
import System.Environment
import System.Process (readProcess)
import Text.Read

-- | Parse output of @nslookup -query=TXT $HOSTNAME@ tolerantly
parseNsLookupTxt :: String -> Maybe [(String,[String])]
parseNsLookupTxt = go0 [] []
  where
    -- approximate grammar:
    -- <entries> := { <entry> }
    -- (<entry> starts at begin of line, but may span multiple lines)
    -- <entry> := ^ <hostname> TAB "text =" { <qstring> }
    -- <qstring> := string enclosed by '"'s ('\' and '"' are \-escaped)

    -- scan for ^ <word> <TAB> "text ="
    go0 []  _  []                                = Nothing
    go0 res _  []                                = Just (reverse res)
    go0 res _  ('\n':xs)                         = go0 res [] xs
    go0 res lw ('\t':'t':'e':'x':'t':' ':'=':xs) = go1 res (reverse lw) [] (dropWhile isSpace xs)
    go0 res lw (x:xs)                            = go0 res (x:lw) xs

    -- collect at least one <qstring>
    go1 res lw qs ('"':xs) = case qstr "" xs of
      Just (s, xs') -> go1 res lw (s:qs) (dropWhile isSpace xs')
      Nothing       -> Nothing -- bad quoting
    go1 res lw [] _  = Nothing -- missing qstring
    go1 res lw qs xs = go0 ((lw,reverse qs):res) [] xs

    qstr acc ('\n':_) = Nothing -- We don't support unquoted LFs
    qstr acc ('\\':'\\':cs) = qstr ('\\':acc) cs
    qstr acc ('\\':'"':cs)  = qstr ('"':acc) cs
    qstr acc ('"':cs) = Just (reverse acc, cs)
    qstr acc (c:cs)   = qstr (c:acc) cs
    qstr _   []       = Nothing

mirrorsDnsName :: String
mirrorsDnsName = "_mirrors.hackage.haskell.org"

extractMirrors :: String -> [String]
extractMirrors s0 = map snd $ sort vals
  where
    vals = [ (kn,v) | (h,ents) <- maybe [] id $ parseNsLookupTxt s0
                    , h == mirrorsDnsName
                    , e <- ents
                    , Just (k,v) <- [splitRfc1464 e]
                    , Just kn <- [isUrlBase k]
                    ]

    isUrlBase :: String -> Maybe Int
    isUrlBase s
      | isSuffixOf ".urlbase" s, not (null ns), all isDigit ns = readMaybe ns
      | otherwise = Nothing
      where
        ns = take (length s - 8) s

splitRfc1464 :: String -> Maybe (String,String)
splitRfc1464 = go ""
  where
    go _ [] = Nothing
    go acc ('`':c:cs) = go (c:acc) cs
    go acc ('=':cs)   = go2 (reverse acc) "" cs
    go acc (c:cs)
      | isSpace c = go acc cs
      | otherwise = go (c:acc) cs

    go2 k acc [] = Just (k,reverse acc)
    go2 k acc ['`'] = Nothing
    go2 k acc ('`':c:cs) = go2 k (c:acc) cs
    go2 k acc (c:cs) = go2 k (c:acc) cs

main :: IO ()
main = do
    fns <- getArgs

    if null fns
    then do
      output <- readProcess "nslookup" ["-query=TXT", mirrorsDnsName] ""
      print (extractMirrors output)
    else do
      forM_ fns $ \fn -> do
        output <- readFile fn
        print (fn,extractMirrors output)

    return ()

Its output is simply

["http://hackage.fpcomplete.com/","http://objects-us-west-1.dream.io/hackage-mirror/"]

hvr avatar Oct 06 '16 16:10 hvr

after a short conversation with @dcoutts the conclusion is that I'm going to integrate this into cabal-install real-soon-now(tm), no changes in hackage-security needed for now

hvr avatar Oct 07 '16 08:10 hvr