hoogle icon indicating copy to clipboard operation
hoogle copied to clipboard

Hoogle for OCaml

Open UnixJunkie opened this issue 8 years ago • 41 comments

We need this tool since 10 years ago...

UnixJunkie avatar Feb 23 '17 22:02 UnixJunkie

Description of what is required is at http://neilmitchell.blogspot.co.uk/2011/03/hoogle-for-your-language-ie-f-scala-ml.html. We're currently at:

A volunteer needs to generate some Hoogle input files containing details of the modules/functions/packages etc. to be searched. These files should be plain text, but can be in a language specific format - i.e. ML syntax for type signatures. For a rough idea of how these files could look see this example - for Haskell I get these files from Hackage. The code to generate these input files can be written in any language, and can live outside Hoogle.

ndmitchell avatar Feb 26 '17 22:02 ndmitchell

@avsm @yminsky @diml @lefessan

UnixJunkie avatar Feb 27 '17 14:02 UnixJunkie

The INRIA SED might also be interested in helping with that: @shindere @thierry-martinez

UnixJunkie avatar Feb 27 '17 15:02 UnixJunkie

@dbuenzli

UnixJunkie avatar Feb 27 '17 15:02 UnixJunkie

@samoht

UnixJunkie avatar Feb 28 '17 14:02 UnixJunkie

I generated all the .mli files I could install with OPAM. They are here: https://github.com/UnixJunkie/ocaml-4.04.0-mlis @ndmitchell does this help?

UnixJunkie avatar Mar 01 '17 18:03 UnixJunkie

I did this for the latest stable version of OCaml.

UnixJunkie avatar Mar 01 '17 18:03 UnixJunkie

That looks good. Next thing we'd need is a Haskell parser for the subset of OCaml contained in mli files.

ndmitchell avatar Mar 02 '17 07:03 ndmitchell

I asked more competent people. I hope some will manifest themselves.

UnixJunkie avatar Mar 02 '17 15:03 UnixJunkie

Hello! I have just started working on this (https://github.com/nojb/haskell-ocaml-parser). It is my first time writing Haskell, but I hope to have something working in the next several days (work allowing).

nojb avatar Mar 03 '17 11:03 nojb

Many thanks @nojb!

shindere avatar Mar 03 '17 11:03 shindere

I'm happy enough with Happy, although I would say my first approach is usually to use something like parsec/attoparsec (typically the latter). Use whichever you prefer though.

ndmitchell avatar Mar 03 '17 11:03 ndmitchell

That was my first thought as well, but the OCaml syntax is large and complicated. Moreover the official compiler also uses a yacc-based parser. Using a similar technology makes it easier for me and, I think, will make it easier to maintain in the long run.

nojb avatar Mar 03 '17 11:03 nojb

That makes a lot of sense. Are you imagining to release your OCaml parser as a library I depend on, or merge the code inside Hoogle? I'm happy either way, although I imagine an OCaml parser could be generally useful.

ndmitchell avatar Mar 03 '17 11:03 ndmitchell

I haven't given it much thought yet, but indeed I think it makes sense to release as a library.

nojb avatar Mar 03 '17 11:03 nojb

Another approach would be to apply a preprocessing phase implemented in OCaml (and that would be able to use compiler-libs for example) that would output a text file with easier-to-parse signatures. Something that worried me about the project of having Hoogle for OCaml is that the OCaml module system makes bare signatures carry very little information (you will get a lot of functions of type t -> t, or something like that). Expanding type signatures to fully qualified types probably requires a non-trivial treatment that could benefit from compiler-libs and that would be difficult to reproduce in an ad-hoc parser.

thierry-martinez avatar Mar 03 '17 11:03 thierry-martinez

I like the idea of applying a preprocessing step on the OCaml side to make it easier to parse on the Haskell side.

Regarding the second part of your suggestion: if I understand correctly you are saying that to "know" which type a particular type constructor refers to requires more than just a syntactic analysis (for example to take into account "opens" and "includes"). This means that we probably need take the .cmtis as input instead of .mlis, what do you think ?

nojb avatar Mar 03 '17 12:03 nojb

If you need more files to be pushed in there: https://github.com/UnixJunkie/ocaml-4.04.0-mlis just ping me

UnixJunkie avatar Mar 03 '17 14:03 UnixJunkie

@ndmitchell Hi Neil, in order to concretize matters a little bit, could you post a link to the internal Hoogle representation that we must target (step "2" of your blog post) ? Thanks !

nojb avatar Mar 03 '17 15:03 nojb

https://github.com/ndmitchell/hoogle/blob/master/src/Input/Item.hs has all the data types. Item is the root type, Sig is where most of the complexity lies since that is type signatures.

ndmitchell avatar Mar 03 '17 15:03 ndmitchell

@nojb I think that .cmi files are just fine. I wrote a small tool that dumps .cmi files in Haskell syntax: it should be directly parsable with Hoogle without any custom parser. Queries will need to be preprocessed, though. https://github.com/thierry-martinez/hooglebackend

thierry-martinez avatar Mar 03 '17 17:03 thierry-martinez

Great! Does it mean we can already use Hoogle with OCaml (even if in a basic manner) ?

nojb avatar Mar 03 '17 17:03 nojb

I pushed all .cmi files in there too: https://github.com/UnixJunkie/ocaml-4.04.0-mlis We have 3518 cmi files and 1881 mli files.

UnixJunkie avatar Mar 03 '17 17:03 UnixJunkie

Should I run Thierry's dumper on all the .cmi files and store its outputs? I can do this tomorrow if needed. Just ping me.

UnixJunkie avatar Mar 03 '17 23:03 UnixJunkie

I think we are not there quite yet. I played around with Thierry's tool a little bit. For example, running hooglebackend foo.cmi, where foo.ml is

open Map
module M = Make (String)

gives

module Foo where
module Foo__2EM where
data Tkey
data Tt t0
empty :: Tt a
is_empty :: (Tt a -> Tbool)
mem :: (Tkey -> (Tt a -> Tbool))
add :: (Tkey -> (a -> (Tt a -> Tt a)))
singleton :: (Tkey -> (a -> Tt a))
remove :: (Tkey -> (Tt a -> Tt a))
merge :: ((Tkey -> (Toption a -> (Toption b -> Toption c))) -> (Tt a -> (Tt b -> Tt c)))
union :: ((Tkey -> (a -> (a -> Toption a))) -> (Tt a -> (Tt a -> Tt a)))
compare :: ((a -> (a -> Tint)) -> (Tt a -> (Tt a -> Tint)))
equal :: ((a -> (a -> Tbool)) -> (Tt a -> (Tt a -> Tbool)))
iter :: ((Tkey -> (a -> Tunit)) -> (Tt a -> Tunit))
fold :: ((Tkey -> (a -> (b -> b))) -> (Tt a -> (b -> b)))
for_all :: ((Tkey -> (a -> Tbool)) -> (Tt a -> Tbool))
exists :: ((Tkey -> (a -> Tbool)) -> (Tt a -> Tbool))
filter :: ((Tkey -> (a -> Tbool)) -> (Tt a -> Tt a))
partition :: ((Tkey -> (a -> Tbool)) -> (Tt a -> (Tt a, Tt a, Tt a)))
cardinal :: (Tt a -> Tint)
bindings :: (Tt a -> Tlist (Tkey, Tkey, a))
min_binding :: (Tt a -> (Tkey, Tkey, a))
max_binding :: (Tt a -> (Tkey, Tkey, a))
choose :: (Tt a -> (Tkey, Tkey, a))
split :: (Tkey -> (Tt a -> (Tt a, Tt a, Toption a, Tt a)))
find :: (Tkey -> (Tt a -> a))
map :: ((a -> b) -> (Tt a -> Tt b))
mapi :: ((Tkey -> (a -> b)) -> (Tt a -> Tt b))
module Foo where

I can see two issues right away:

  • It is probably a good idea to emit Haskell code that constructs the required internal representation directly (data type Item in https://github.com/ndmitchell/hoogle/blob/master/src/Input/Item.hs). This would avoid the gymnastics to account for the difference in lexical conventions between Haskell and OCaml.

  • Manifest types are not taken into account. For example above the type key (Tkey) appears abstract when in fact is known to be string.

nojb avatar Mar 04 '17 08:03 nojb

I would advise against targeting Item directly from an external tool. Item is very much an internal detail of Hoogle, so anything operating on it should live in Hoogle itself. Translating to Haskell (or Haskell-like) is probably a better approach.

ndmitchell avatar Mar 04 '17 11:03 ndmitchell

One observation/suggestion: tools like ocp-index and ocamlspot manage to keep a rather accurate account of the types in one's source tree. They do rely on .cmt/.cmti files being present, but it doesn't sound like that's a big problem for this project. Perhaps it would make sense to look at extending one of those tools (or even ocamlbrowser) to dump their database of types in a tree as S-experssions (say), so parsing on the Haskell side is not too difficult?

zhenya1007 avatar Mar 05 '17 13:03 zhenya1007

In my opam setup in a VM with as many packages as I could install: there are more .cmi files than any other type (3518 .cmi files, 1880 .mli files, 1713 .cmt files, 1328 .cmti files). So, I guess it is better to target .cmi files if we want to have as many libraries as possible being indexed.

UnixJunkie avatar Mar 06 '17 14:03 UnixJunkie

I updated https://github.com/thierry-martinez/hooglebackend : I chose a JSON-compatible format, using only lists, strings and integers. I checked that there exist some JSON parsers in Haskell, and it should be trivial to parse from scratch anyway. Type manifests are now handled correctly (thanks @nojb !).

thierry-martinez avatar Mar 08 '17 18:03 thierry-martinez

I added a .cmi.json file for each .cmi file in there: https://github.com/UnixJunkie/ocaml-4.04.0-mlis. The .json files were created using Thierry's hooglebackend software.

UnixJunkie avatar Mar 10 '17 15:03 UnixJunkie