Documentation of urlEncode is misleading

Open MaxGabriel opened this issue 4 years ago • 1 comments

Version 0.12.3

The documentation of urlEncode for the boolean parameter is:

Whether to decode '+' to ' '

This doesn't appear to be the whole story though. It also controls other characters like @

ghci> import Prelude
ghci> URI.urlEncode False "[email protected]"
"[email protected]"
ghci> URI.urlEncode True "[email protected]"
"test%40example.com"

This documentation appears true for urlDecoding (replacePlus is only used in one place).

-- | Percent-decoding.
urlDecode :: Bool -- ^ Whether to decode @\'+\'@ to @\' \'@
          -> B.ByteString -> B.ByteString
urlDecode replacePlus z = fst $ B.unfoldrN (B.length z) go z
  where
    go bs =
        case B.uncons bs of
            Nothing -> Nothing
            Just (43, ws) | replacePlus -> Just (32, ws) -- plus to space
            Just (37, ws) -> Just $ fromMaybe (37, ws) $ do -- percent
                (x, xs) <- B.uncons ws
                x' <- hexVal x
                (y, ys) <- B.uncons xs
                y' <- hexVal y
                Just (combine x' y', ys)
            Just (w, ws) -> Just (w, ws)
    hexVal w
        | 48 <= w && w <= 57  = Just $ w - 48 -- 0 - 9
        | 65 <= w && w <= 70  = Just $ w - 55 -- A - F
        | 97 <= w && w <= 102 = Just $ w - 87 -- a - f
        | otherwise = Nothing
    combine :: Word8 -> Word8 -> Word8
    combine a b = shiftL a 4 .|. b

But in URL encoding, the true/false flag isn't restricted to just spaces:

unreservedQS, unreservedPI :: [Word8]
unreservedQS = map ord8 "-_.~"
unreservedPI = map ord8 "-_.~:@&=+$,"

-- | Percent-encoding for URLs.
urlEncodeBuilder' :: [Word8] -> B.ByteString -> B.Builder
urlEncodeBuilder' extraUnreserved = mconcat . map encodeChar . B.unpack
    where
      encodeChar ch | unreserved ch = B.word8 ch
                    | otherwise     = h2 ch

      unreserved ch | ch >= 65 && ch <= 90  = True -- A-Z
                    | ch >= 97 && ch <= 122 = True -- a-z
                    | ch >= 48 && ch <= 57  = True -- 0-9
      unreserved c = c `elem` extraUnreserved

      -- must be upper-case
      h2 v = B.word8 37 `mappend` B.word8 (h a) `mappend` B.word8 (h b) -- 37 = %
          where (a, b) = v `divMod` 16
      h i | i < 10    = 48 + i -- zero (0)
          | otherwise = 65 + i - 10 -- 65: A

-- | Percent-encoding for URLs (using 'B.Builder').
urlEncodeBuilder
    :: Bool -- ^ Whether input is in query string. True: Query string, False: Path element
    -> B.ByteString
    -> B.Builder
urlEncodeBuilder True  = urlEncodeBuilder' unreservedQS
urlEncodeBuilder False = urlEncodeBuilder' unreservedPI

-- | Percent-encoding for URLs.
urlEncode :: Bool -- ^ Whether to decode @\'+\'@ to @\' \'@
          -> B.ByteString -- ^ The ByteString to encode as URL
          -> B.ByteString -- ^ The encoded URL
urlEncode q = BL.toStrict . B.toLazyByteString . urlEncodeBuilder q

I'm not super familiar with this domain. Should URL decoding have an option to work with those other characters? Should just the documentation for URL encoding change?

Feb 16 '21 16:02 MaxGabriel

Fixed in 0.12.4 [Vlix#1]

Nov 30 '23 19:11 Vlix