http-types
http-types copied to clipboard
Documentation of urlEncode is misleading
Version 0.12.3
The documentation of urlEncode for the boolean parameter is:
Whether to decode '+' to ' '
This doesn't appear to be the whole story though. It also controls other characters like @
ghci> import Prelude
ghci> URI.urlEncode False "[email protected]"
"[email protected]"
ghci> URI.urlEncode True "[email protected]"
"test%40example.com"
This documentation appears true for urlDecoding (replacePlus is only used in one place).
-- | Percent-decoding.
urlDecode :: Bool -- ^ Whether to decode @\'+\'@ to @\' \'@
-> B.ByteString -> B.ByteString
urlDecode replacePlus z = fst $ B.unfoldrN (B.length z) go z
where
go bs =
case B.uncons bs of
Nothing -> Nothing
Just (43, ws) | replacePlus -> Just (32, ws) -- plus to space
Just (37, ws) -> Just $ fromMaybe (37, ws) $ do -- percent
(x, xs) <- B.uncons ws
x' <- hexVal x
(y, ys) <- B.uncons xs
y' <- hexVal y
Just (combine x' y', ys)
Just (w, ws) -> Just (w, ws)
hexVal w
| 48 <= w && w <= 57 = Just $ w - 48 -- 0 - 9
| 65 <= w && w <= 70 = Just $ w - 55 -- A - F
| 97 <= w && w <= 102 = Just $ w - 87 -- a - f
| otherwise = Nothing
combine :: Word8 -> Word8 -> Word8
combine a b = shiftL a 4 .|. b
But in URL encoding, the true/false flag isn't restricted to just spaces:
unreservedQS, unreservedPI :: [Word8]
unreservedQS = map ord8 "-_.~"
unreservedPI = map ord8 "-_.~:@&=+$,"
-- | Percent-encoding for URLs.
urlEncodeBuilder' :: [Word8] -> B.ByteString -> B.Builder
urlEncodeBuilder' extraUnreserved = mconcat . map encodeChar . B.unpack
where
encodeChar ch | unreserved ch = B.word8 ch
| otherwise = h2 ch
unreserved ch | ch >= 65 && ch <= 90 = True -- A-Z
| ch >= 97 && ch <= 122 = True -- a-z
| ch >= 48 && ch <= 57 = True -- 0-9
unreserved c = c `elem` extraUnreserved
-- must be upper-case
h2 v = B.word8 37 `mappend` B.word8 (h a) `mappend` B.word8 (h b) -- 37 = %
where (a, b) = v `divMod` 16
h i | i < 10 = 48 + i -- zero (0)
| otherwise = 65 + i - 10 -- 65: A
-- | Percent-encoding for URLs (using 'B.Builder').
urlEncodeBuilder
:: Bool -- ^ Whether input is in query string. True: Query string, False: Path element
-> B.ByteString
-> B.Builder
urlEncodeBuilder True = urlEncodeBuilder' unreservedQS
urlEncodeBuilder False = urlEncodeBuilder' unreservedPI
-- | Percent-encoding for URLs.
urlEncode :: Bool -- ^ Whether to decode @\'+\'@ to @\' \'@
-> B.ByteString -- ^ The ByteString to encode as URL
-> B.ByteString -- ^ The encoded URL
urlEncode q = BL.toStrict . B.toLazyByteString . urlEncodeBuilder q
I'm not super familiar with this domain. Should URL decoding have an option to work with those other characters? Should just the documentation for URL encoding change?
Fixed in 0.12.4 [Vlix#1]