hoogle
hoogle copied to clipboard
Hoogle for OCaml
We need this tool since 10 years ago...
Description of what is required is at http://neilmitchell.blogspot.co.uk/2011/03/hoogle-for-your-language-ie-f-scala-ml.html. We're currently at:
A volunteer needs to generate some Hoogle input files containing details of the modules/functions/packages etc. to be searched. These files should be plain text, but can be in a language specific format - i.e. ML syntax for type signatures. For a rough idea of how these files could look see this example - for Haskell I get these files from Hackage. The code to generate these input files can be written in any language, and can live outside Hoogle.
@avsm @yminsky @diml @lefessan
The INRIA SED might also be interested in helping with that: @shindere @thierry-martinez
@dbuenzli
@samoht
I generated all the .mli files I could install with OPAM. They are here: https://github.com/UnixJunkie/ocaml-4.04.0-mlis @ndmitchell does this help?
I did this for the latest stable version of OCaml.
That looks good. Next thing we'd need is a Haskell parser for the subset of OCaml contained in mli files.
I asked more competent people. I hope some will manifest themselves.
Hello! I have just started working on this (https://github.com/nojb/haskell-ocaml-parser). It is my first time writing Haskell, but I hope to have something working in the next several days (work allowing).
Many thanks @nojb!
I'm happy enough with Happy, although I would say my first approach is usually to use something like parsec/attoparsec (typically the latter). Use whichever you prefer though.
That was my first thought as well, but the OCaml syntax is large and complicated. Moreover the official compiler also uses a yacc-based parser. Using a similar technology makes it easier for me and, I think, will make it easier to maintain in the long run.
That makes a lot of sense. Are you imagining to release your OCaml parser as a library I depend on, or merge the code inside Hoogle? I'm happy either way, although I imagine an OCaml parser could be generally useful.
I haven't given it much thought yet, but indeed I think it makes sense to release as a library.
Another approach would be to apply a preprocessing phase implemented in OCaml (and that would be able to use compiler-libs for example) that would output a text file with easier-to-parse signatures. Something that worried me about the project of having Hoogle for OCaml is that the OCaml module system makes bare signatures carry very little information (you will get a lot of functions of type t -> t, or something like that). Expanding type signatures to fully qualified types probably requires a non-trivial treatment that could benefit from compiler-libs and that would be difficult to reproduce in an ad-hoc parser.
I like the idea of applying a preprocessing step on the OCaml side to make it easier to parse on the Haskell side.
Regarding the second part of your suggestion: if I understand correctly you are saying that to "know" which type a particular type constructor refers to requires more than just a syntactic analysis (for example to take into account "opens" and "includes"). This means that we probably need take the .cmtis
as input instead of .mlis
, what do you think ?
If you need more files to be pushed in there: https://github.com/UnixJunkie/ocaml-4.04.0-mlis just ping me
@ndmitchell Hi Neil, in order to concretize matters a little bit, could you post a link to the internal Hoogle representation that we must target (step "2" of your blog post) ? Thanks !
https://github.com/ndmitchell/hoogle/blob/master/src/Input/Item.hs has all the data types. Item
is the root type, Sig
is where most of the complexity lies since that is type signatures.
@nojb I think that .cmi files are just fine. I wrote a small tool that dumps .cmi files in Haskell syntax: it should be directly parsable with Hoogle without any custom parser. Queries will need to be preprocessed, though. https://github.com/thierry-martinez/hooglebackend
Great! Does it mean we can already use Hoogle with OCaml (even if in a basic manner) ?
I pushed all .cmi files in there too: https://github.com/UnixJunkie/ocaml-4.04.0-mlis We have 3518 cmi files and 1881 mli files.
Should I run Thierry's dumper on all the .cmi files and store its outputs? I can do this tomorrow if needed. Just ping me.
I think we are not there quite yet. I played around with Thierry's tool a little bit. For example, running hooglebackend foo.cmi
, where foo.ml
is
open Map
module M = Make (String)
gives
module Foo where
module Foo__2EM where
data Tkey
data Tt t0
empty :: Tt a
is_empty :: (Tt a -> Tbool)
mem :: (Tkey -> (Tt a -> Tbool))
add :: (Tkey -> (a -> (Tt a -> Tt a)))
singleton :: (Tkey -> (a -> Tt a))
remove :: (Tkey -> (Tt a -> Tt a))
merge :: ((Tkey -> (Toption a -> (Toption b -> Toption c))) -> (Tt a -> (Tt b -> Tt c)))
union :: ((Tkey -> (a -> (a -> Toption a))) -> (Tt a -> (Tt a -> Tt a)))
compare :: ((a -> (a -> Tint)) -> (Tt a -> (Tt a -> Tint)))
equal :: ((a -> (a -> Tbool)) -> (Tt a -> (Tt a -> Tbool)))
iter :: ((Tkey -> (a -> Tunit)) -> (Tt a -> Tunit))
fold :: ((Tkey -> (a -> (b -> b))) -> (Tt a -> (b -> b)))
for_all :: ((Tkey -> (a -> Tbool)) -> (Tt a -> Tbool))
exists :: ((Tkey -> (a -> Tbool)) -> (Tt a -> Tbool))
filter :: ((Tkey -> (a -> Tbool)) -> (Tt a -> Tt a))
partition :: ((Tkey -> (a -> Tbool)) -> (Tt a -> (Tt a, Tt a, Tt a)))
cardinal :: (Tt a -> Tint)
bindings :: (Tt a -> Tlist (Tkey, Tkey, a))
min_binding :: (Tt a -> (Tkey, Tkey, a))
max_binding :: (Tt a -> (Tkey, Tkey, a))
choose :: (Tt a -> (Tkey, Tkey, a))
split :: (Tkey -> (Tt a -> (Tt a, Tt a, Toption a, Tt a)))
find :: (Tkey -> (Tt a -> a))
map :: ((a -> b) -> (Tt a -> Tt b))
mapi :: ((Tkey -> (a -> b)) -> (Tt a -> Tt b))
module Foo where
I can see two issues right away:
-
It is probably a good idea to emit Haskell code that constructs the required internal representation directly (data type
Item
in https://github.com/ndmitchell/hoogle/blob/master/src/Input/Item.hs). This would avoid the gymnastics to account for the difference in lexical conventions between Haskell and OCaml. -
Manifest types are not taken into account. For example above the type
key
(Tkey
) appears abstract when in fact is known to bestring
.
I would advise against targeting Item directly from an external tool. Item is very much an internal detail of Hoogle, so anything operating on it should live in Hoogle itself. Translating to Haskell (or Haskell-like) is probably a better approach.
One observation/suggestion: tools like ocp-index and ocamlspot manage to keep a rather accurate account of the types in one's source tree. They do rely on .cmt
/.cmti
files being present, but it doesn't sound like that's a big problem for this project. Perhaps it would make sense to look at extending one of those tools (or even ocamlbrowser) to dump their database of types in a tree as S-experssions (say), so parsing on the Haskell side is not too difficult?
In my opam setup in a VM with as many packages as I could install: there are more .cmi files than any other type (3518 .cmi files, 1880 .mli files, 1713 .cmt files, 1328 .cmti files). So, I guess it is better to target .cmi files if we want to have as many libraries as possible being indexed.
I updated https://github.com/thierry-martinez/hooglebackend : I chose a JSON-compatible format, using only lists, strings and integers. I checked that there exist some JSON parsers in Haskell, and it should be trivial to parse from scratch anyway. Type manifests are now handled correctly (thanks @nojb !).
I added a .cmi.json file for each .cmi file in there: https://github.com/UnixJunkie/ocaml-4.04.0-mlis. The .json files were created using Thierry's hooglebackend software.