req icon indicating copy to clipboard operation
req copied to clipboard

Difference in (url encoding) behaviour compared with HTTPoison

Open thbar opened this issue 1 year ago • 15 comments

While migrating a part of our snapshot crawler (https://github.com/etalab/transport-site/pull/3585), I did some largish scale testing, comparing the behaviour of HTTPoison and Req in detail.

One thing that came out is that urls with pipes | will result in errors, while HTTPoison for some reason (probably linked to how hackney works underneath) download them just fine.

Here is a reproduction on production urls:

data = [
  "https://api.oisemob.cityway.fr/dataflow/offre-tc/download?provider=TIC_URB|TIC_INT|ALLOTIC&dataFormat=Netex&dataProfil=OPENDATA",
  "https://api.oisemob.cityway.fr/dataflow/offre-tc/download?provider=TIC_URB|TIC_INT|ALLOTIC&dataFormat=Netex&dataProfil=OPENDATA",
  "https://api.oisemob.cityway.fr/dataflow/offre-tc/download?provider=COROLIS_URB|COROLIS_INT&dataFormat=NETEX&dataProfil=OPENDATA",
  "https://api.oisemob.cityway.fr/dataflow/offre-tc/download?provider=TIC_URB|TIC_INT|ALLOTIC&dataFormat=Netex&dataProfil=OPENDATA",
  "https://api.oisemob.cityway.fr/dataflow/offre-tc/download?provider=COROLIS_URB|COROLIS_INT&dataFormat=NETEX&dataProfil=OPENDATA"
]

data
|> Enum.each(fn x ->
  IO.puts("==========================")
  IO.puts(x)
  IO.inspect(Req.get(x, compressed: false, decode_body: false))
  IO.inspect(Req.get(x |> String.replace("|", URI.encode("|")), compressed: false, decode_body: false))
end)

Typically Req without the replacement will return:

{:error,
 %Mint.HTTPError{
   reason: {:invalid_request_target,
    "/dataflow/offre-tc/download?provider=COROLIS_URB|COROLIS_INT&dataFormat=NETEX&dataProfil=OPENDATA"},
   module: Mint.HTTP1
 }}

While it will download the file just fine after a replacement.

I'm not sure who is in the right, but this could catch HTTPoison users off-guard!

thbar avatar Nov 14 '23 19:11 thbar