ocaml-cohttp icon indicating copy to clipboard operation
ocaml-cohttp copied to clipboard

simple parsing/printing of requests/responses

Open agarwal opened this issue 10 years ago • 15 comments
trafficstars

I would like to parse simple test data. For example, here's a file with a request:

POST / http/1.1
Content-Type:application/x-www-form-urlencoded; charset=utf8
Date:Mon, 09 Sep 2011 23:36:00 GMT
Host:host.foo.com

foo=bar

I'd like a function like this:

val of_file : filename -> (Request.t * Body.t)

where Body.t = string. The closest I've managed to figure out is Cohttp.Request.Make(Cohttp.String_io.M), but AFAICT I can't get the body from any of the resulting functions.

I'll also want a similar function to parse responses.

All this is provided for Lwt and Async, but I don't see solutions for the simpler case of data in plain strings.

agarwal avatar Sep 24 '15 15:09 agarwal

I think this test file may help you.

Yes, it is inconvenient that this is not in library.

objmagic avatar Sep 24 '15 16:09 objmagic

I think that function only deals with the header, but I'm still having trouble getting the body. After instantiating the functor I mentioned, has_body just returns ``Unknown`.

agarwal avatar Sep 24 '15 16:09 agarwal

Hi, @agarwal

I wrote a short piece of code and it works well.


(** ocamlbuild -use-ocamlfind -pkg cohttp simple_parsing.native *)
open Cohttp

module type S = sig
  val parse : string -> (Request.t * Body.t)
end

module StringParse : S = struct

  module Req = Cohttp.Request.Make(Cohttp.String_io.M)

  let parse str =
    let open String_io.M in
    let ic = String_io.open_in str in
    Req.read ic >>= fun result ->
    match result with
    | `Ok req -> begin
      let reader = Req.make_body_reader req ic in
      let rec loop acc =
        Req.read_body_chunk reader >>= (fun result ->
        match result with
        | Transfer.Chunk str -> loop (str :: acc)
        | Transfer.Final_chunk str -> str :: acc
        | Transfer.Done -> acc) in
      let body = loop [] |> Body.of_string_list in
      req, body end
    | `Invalid error -> assert false
    | `Eof -> assert false

end

let str = "GET / HTTP/1.1\r\nhost: example.com\r\ncontent-length:3\r\n\r\n123"

let () = ignore (StringParse.parse str)

objmagic avatar Sep 24 '15 20:09 objmagic

A pull req for this would be great!

On 24 Sep 2015, at 22:20, Runhang (Mark) Li [email protected] wrote:

Hi, @agarwal

I wrote a short piece of code and it works well.

(** ocamlbuild -use-ocamlfind -pkg cohttp simple_parsing.native *) open Cohttp

module type S = sig val parse : string -> (Request.t * Body.t) end

module StringParse : S = struct

module Req = Cohttp.Request.Make(Cohttp.String_io.M)

let parse str = let open String_io.M in let ic = String_io.open_in str in Req.read ic >>= fun result -> match result with | Ok req -> begin let reader = Req.make_body_reader req ic in let rec loop acc = Req.read_body_chunk reader >>= (fun result -> match result with | Transfer.Chunk str -> loop (str :: acc) | Transfer.Final_chunk str -> str :: acc | Transfer.Done -> acc) in let body = loop [] |> Body.of_string_list in req, body end |Invalid error -> assert false | `Eof -> assert false

end

let str = "GET / HTTP/1.1\r\nhost: example.com\r\ncontent-length:3\r\n\r\n123"

let () = ignore (StringParse.parse str) — Reply to this email directly or view it on GitHub.

avsm avatar Sep 25 '15 06:09 avsm

@rgrinberg what do you think if I'd like to contribute some blocking I/O code like /async and /lwt, /lwt-core? Should I create an another directory called /block_io?

objmagic avatar Sep 25 '15 14:09 objmagic

@marklrh Thanks. I was doing at least two things wrong. Still unclear why has_body returns ``Unknown`, but I don't actually need that right now.

It would probably help beginners to have a blocking version of cohttp. I've recently re-organized future and biocaml to support this kind of thing. Each sub-directory under my lib/ directories provides a separate library with different dependencies.

agarwal avatar Sep 25 '15 20:09 agarwal

Should I create an another directory called /block_io?

FYI, here's some internal documentation I follow for choosing names in this context. (I'm not confident that I've got this right. I welcome comments).

We provide a separate library for each of several architectures, which are defined by two parameters:

  1. Purity: Whether or not the library makes Unix calls or has bindings to C code. If the library does either of these, we label it unix. If it does neither, we label it pure.
  2. Concurrency: If the library uses lwt or async, we label it as such. If it makes blocking calls, we don't assign it any label, i.e. the label is the empty string (because this case is already covered by the unix label of the purity criteria).

Thus, there are 6 combinations possible: pure, lwt-pure, async-pure, unix, async-unix, and lwt-unix.

agarwal avatar Sep 25 '15 20:09 agarwal

A content-length header appears to be required. Is that correct? Without that, as in my original example, I always get an End_of_file error.

agarwal avatar Sep 27 '15 20:09 agarwal

@marklrh Would this blocking IO depend on unix?

IIRC @seliopou worked on this before.

Let me recall what this would require.

rgrinberg avatar Sep 27 '15 20:09 rgrinberg

I feel like it will not depend on Unix. We will use string_io module to read and write.

objmagic avatar Sep 27 '15 21:09 objmagic

/cc @kayceesrk who was also interested in building an experimental effects-based implementation as well.

avsm avatar Sep 28 '15 09:09 avsm

@agarwal I believe so. See HTTP RFC ch14

@avsm @kayceesrk I would love to help with effects-based impl. it looks interesting

objmagic avatar Sep 28 '15 15:09 objmagic

@marklrh Section 4.4 seems to be the most relevant, and AFAICT it is not absolutely required. Too bad all these specs are informally written.

I guess the question is if Cohttp makes available lower level parsing that does less checks? In Biocaml, I often structure parsers with types t0, t1, ..., t, where t0 is the least parsed, maybe just a string, t1 parses a bit more, and so on, until t, a type that enforces absolutely every requirement. This firstly provides an escape hatch in case of non-compliant data (very common in bioinformatics), and also aids in efficiency since you can selectively do less parsing when you don't need the richer types.

agarwal avatar Sep 28 '15 19:09 agarwal

@marklrh then I think block_io is a bit of misleading name. If it isn't using unix how is it blocking? Something like string_io or string

In any case, let's see a PR and we can kvetch about the name there.

rgrinberg avatar Sep 28 '15 21:09 rgrinberg

@marklrh Would be great if you could help! Let's take the discussion off-thread.

kayceesrk avatar Sep 29 '15 12:09 kayceesrk