req
req copied to clipboard
Difference in (url encoding) behaviour compared with HTTPoison
While migrating a part of our snapshot crawler (https://github.com/etalab/transport-site/pull/3585), I did some largish scale testing, comparing the behaviour of HTTPoison and Req in detail.
One thing that came out is that urls with pipes |
will result in errors, while HTTPoison for some reason (probably linked to how hackney works underneath) download them just fine.
Here is a reproduction on production urls:
data = [
"https://api.oisemob.cityway.fr/dataflow/offre-tc/download?provider=TIC_URB|TIC_INT|ALLOTIC&dataFormat=Netex&dataProfil=OPENDATA",
"https://api.oisemob.cityway.fr/dataflow/offre-tc/download?provider=TIC_URB|TIC_INT|ALLOTIC&dataFormat=Netex&dataProfil=OPENDATA",
"https://api.oisemob.cityway.fr/dataflow/offre-tc/download?provider=COROLIS_URB|COROLIS_INT&dataFormat=NETEX&dataProfil=OPENDATA",
"https://api.oisemob.cityway.fr/dataflow/offre-tc/download?provider=TIC_URB|TIC_INT|ALLOTIC&dataFormat=Netex&dataProfil=OPENDATA",
"https://api.oisemob.cityway.fr/dataflow/offre-tc/download?provider=COROLIS_URB|COROLIS_INT&dataFormat=NETEX&dataProfil=OPENDATA"
]
data
|> Enum.each(fn x ->
IO.puts("==========================")
IO.puts(x)
IO.inspect(Req.get(x, compressed: false, decode_body: false))
IO.inspect(Req.get(x |> String.replace("|", URI.encode("|")), compressed: false, decode_body: false))
end)
Typically Req without the replacement will return:
{:error,
%Mint.HTTPError{
reason: {:invalid_request_target,
"/dataflow/offre-tc/download?provider=COROLIS_URB|COROLIS_INT&dataFormat=NETEX&dataProfil=OPENDATA"},
module: Mint.HTTP1
}}
While it will download the file just fine after a replacement.
I'm not sure who is in the right, but this could catch HTTPoison users off-guard!