dlang.org Ddoc: in-site (Ddox-like) search

This explores how easy/hard it would be to get the same in-site search as for Ddox working for Ddoc too.

The translation from a Ddox path to a Ddoc path is still a bit crude (it's a proof-of-concept).

Mar 31 '18 03:03 wilzbach

Thanks for your pull request, @wilzbach!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Mar 31 '18 03:03 dlang-bot

BTW I like the way Scala's search works. Example

http://www.scala-lang.org/api/current/scala/collection/immutable/Vector.html?search=fun

You type and get instant results on a "full page"
The accept a query string from the URL to their search page

Mar 31 '18 03:03 wilzbach

Oof, that looks pretty fragile.

Why not use chmgen? :) It already generates a search index / keyword tag for the CHM and dman.

Mar 31 '18 04:03 CyberShadow

http://www.scala-lang.org/api/current/scala/collection/immutable/Vector.html?search=fun

That is super nice.

Mar 31 '18 04:03 CyberShadow

http://dpldocs.info/locate?q=fun

im just saying

Mar 31 '18 04:03 adamdruppe

http://dpldocs.info/locate?q=fun

That's better than a Google search, but only marginally. Scala's search results are MUCH better.

The screen is partitioned into the object kind of search results (class, function, method...). Meaning, if you know you are looking for a member, after a few searches you will automatically learn to look on the right side before even results are loaded.
The icons help too, but they probably build upon a pre-existing visual language permeating through their developer tools.
"Search score" in dpldocs is complete noise.

Mar 31 '18 05:03 CyberShadow

http://dpldocs.info/locate?q=fun

Also, the first search result has nothing to do with the search query.

Mar 31 '18 05:03 CyberShadow

That is super nice.

So it turns out it's not too hard to do this for dlang.org (https://github.com/dlang/dlang.org/pull/2319). Though it will require a lot more work to get to a similar state as Scala's nice search. One day maybe ;-) BTW here's how Rust is doing:

https://doc.rust-lang.org/std/?search=fun https://docs.rs/galvanic-test/0.1.3/galvanic_test/?search=test (docs.rs is their dpldocs.info)

Mar 31 '18 05:03 wilzbach

On Sat, Mar 31, 2018 at 05:09:03AM +0000, Vladimir Panteleev wrote:

Also, the first search result has nothing to do with the search query.

That's why I show the search score, it helps me debug stuff like this. The examples for allocator have void fun() in the examples which it thinks is relevant...

But if we were going to do icons for D, we'd have to define them first, and I'm not sure most the traditional categorizations help us. For example, is it really important if a thing is a template, struct, or function? In fact, with Phobos, virtually everything are function templates, so that information is near useless. Very little of Phobos uses object methods too (outside of the range primitives), so that'd be fairly rare... but it might be a reasonable split like the Scala ones have.

I think if we did have an icon scheme, the best category for Phobos might be lazy range vs eager evaluation. So like asUpperCase gets the lazy range icon (whatever that is, i have no idea but i'm a crappy designer), while toUpper is different because it is eager but pure, and perhaps even toUpperInPlace can get something because it mutates the input.

The problem with this is I don't think we can pull it out of the compiler or existing ddoc. Using upper as an example:

asUpperCase just returns auto. We can't reliably assume that means it returns a lazy range... or anything else, really. There's no other machine-readable information in the signature that can give it this category.
toUpperInPlace and toUpper both get the pure attribute. toUpperInPlace does give some hints though: it takes the arg by ref and returns void, so we might be able to use that.

But I think the best thing to do would be to manually tag them... and you know, I do actually think that might be worth doing, both for these search things and just making sure it is clearly documented anyway - some Phobos modules say Returns: an InputRange and some say Returns: an $(REF_ALTTEXT ..., input range) etc. Getting that uniform may be worth doing.

Of course, in some functions, it is adaptive return value to some input. Like map returns an input range given an input range, random access given random access, etc. We should probably think of some way to document this in a consistent fashion regardless, then we can read it with the search thing to put those category icons on it too. (Ditto on attributes btw, I recently have had to remind a lot of people, including experienced D programmers, that templates like map are @nogc if given @nogc arguments... the docs don't really tell you this, you just need to already know.)

The most recent versions of adrdox have a Group: user_defined_identifier section that kinds do this, but a symbol can belong to only one group, so while it can be used for some categorization, I don't think it would be a big win for Phobos.

Mar 31 '18 13:03 adamdruppe

For example, is it really important if a thing is a template, struct, or function?

I think distinguishing function / class / struct / enum would help. Templates can be of anything, so their icon should probably just be a modified version of the thing they're a template of, e.g. with a dotted border.

Apr 01 '18 00:04 CyberShadow

That's why I show the search score, it helps me debug stuff like this.

Yep, but only you can debug it! It should be enabled by a query parameter that can be added manually to the URL for debugging.

Apr 01 '18 00:04 CyberShadow

Why not use chmgen? :) It already generates a search index / keyword tag for the CHM and dman.

Not sure if that's better: https://github.com/dlang/dlang.org/pull/2328 With chmgen we loose all meta information about the symbols and have to "guess" the package and aggregate name too.

So I actually prefer this approach.

Apr 02 '18 23:04 wilzbach

Yeah, good points.

Postprocess dmd's -X output maybe?

I guess we could go with this for now. If it breaks, oh well.

Apr 03 '18 03:04 CyberShadow

This currently adds two new script tags to every page, so it will have a measurable impact on load times. Can that be improved (by loading them lazily maybe)?

Apr 03 '18 03:04 CyberShadow

Can that be improved (by loading them lazily maybe)?

Sure, Done.

Postprocess dmd's -X output maybe?

DDox already does this ;-)

I guess we could go with this for now. If it breaks, oh well.

So I think I "hacked" together a working logic for mapping from "ddoc symbols" to ddoc paths. It's a bit ugly and it not be perfect, but it should cover 98% of all cases (and that's a lot more than 0%).

Apr 05 '18 14:04 wilzbach

Thanks. I just tried it. It's nice but I think it needs some more work.

It's not possible to select a result with the keyboard. Forcing users to move their hand to the mouse just for that one click before they can read the documentation isn't great. Ideally down+enter should select and open the first result. (This is actually a downgrade from the previous search mechanism, as even Google has keyboard navigation.)
The search results are never closed. A pathological case of this is clicking on a search result which goes to the same page - the results just remain there, on the page, covering other content, unable to be closed. I think the results should go away when the search input loses focus.

I'm sure there should be tons of JS libraries for this purpose, might be a good case to avoid NIH here.

Sure, Done.

Loading scripts asynchronously is not the same as loading them lazily. I guess it won't affect performance much, but it's still overhead for bandwidth and JS parsing/compilation. I won't insist though.

DDox already does this ;-)

Okay, well, how about making DDox emit a search index directly usable for DDoc HTML pages, so we can avoid this fragile URL rewriting?

So I think I "hacked" together a working logic for mapping from "ddoc symbols" to ddoc paths. It's a bit ugly and it not be perfect, but it should cover 98% of all cases (and that's a lot more than 0%).

Ideally we would have some specific data so we don't accidentally end up with a much lower coverage than expected (e.g. 80% where the remaining 20% just so happen to be in parts we didn't test by hand), or, even more ideally, find a better approach altogether.

Apr 05 '18 14:04 CyberShadow

On Thu, Apr 05, 2018 at 07:47:10AM -0700, Vladimir Panteleev wrote:

I'm sure there should be tons of JS libraries for this purpose, might be a good case to avoid NIH here.

I would try using the element. It doesn't style as flexibly as a custom link list, but it works with keyboard control etc.

Apr 05 '18 15:04 adamdruppe

Thanks. I just tried it. It's nice but I think it needs some more work.

Yes, I am aware of this, but I don't intend to fix this in this PR as this PR is just about wiring the ddoc pages with Ddox's search index. Note that it's exactly the same search the /library pages give you.

However, I do intend to fix this in https://github.com/dlang/dlang.org/pull/2319 (or at least step by step).

Ideally down+enter should select and open the first result

Just enter already works.

Okay, well, how about making DDox emit a search index directly usable for DDoc HTML pages, so we can avoid this fragile URL rewriting?

Wouldn't only shift the logic to a preprocessor? I mean then the ddox filter would have to "guess" the Ddoc URL.

Ideally we would have some specific data so we don't accidentally end up with a much lower coverage than expected

I can only enable it for the prerelease pages for now if you prefer that, but finding the correct module is reliable. The only thing that we might mess up is the Ddoc anchor and Google doesn't provide them either. The thing with the prerelease pages is that last time the "run examples" feature lay dorment there for half a year.

Apr 05 '18 15:04 wilzbach

more ideally, find a better approach altogether.

Ditch Ddoc and focus on one documentation engine?

Apr 05 '18 15:04 wilzbach

Just enter already works.

Oh, hmm, that's not what I would expect. It should open a full page of results, i.e. do the same thing as if there was no instant search.

In either case, that just helps when the first result is the one you need, and not the second.

Wouldn't only shift the logic to a preprocessor? I mean then the ddox filter would have to "guess" the Ddoc URL.

Yes, but the DDoc URLs are much more predictable, AND you don't need to parse DDox URLs.

The only thing that we might mess up is the Ddoc anchor and Google doesn't provide them either.

Hmm, Google does understand anchor links (example), perhaps we can make that work still.

Ditch Ddoc and focus on one documentation engine?

I think we're probably closer to ditching DDox than DDoc due to lack of ongoing maintenance and remaining issues...

Apr 05 '18 15:04 CyberShadow

Yes, but the DDoc URLs are much more predictable, AND you don't need to parse DDox URLs.

I'm not sure whether we are on the same page here. The preprocessor wouldn't have more information than the JS logic. The JS rewriter already gets an object:

{
    path: "./core/atomic/test_cas.html",
    name: "core.atomic.test_cas"
}

Note that the fully-qualified symbol name is already included and if Ddoc URLs would be so easy to guess, I wouldn't need to use the Ddoc URL. Anyhow, I could modify generateSymbolsJS (at ddox) to add information like the package name too, but that would mean forking Ddox which I'm not so sure is a path we want to go down too (remember that the PR for the last modification to Ddox is still open after two months, so we would end up with a fork).

I will look into rolling our own Ddoc symbol index, but it would be nice to understand what's required for this first iteration:

is it just that you want to be certain that the URLs are correct?
or does the DDox search interface need an entire overhaul before it can replace the Google search (i.e. what https://github.com/dlang/dlang.org/pull/2319 tries to start addressing)

I think we're probably closer to ditching DDox than DDoc due to lack of ongoing maintenance and remaining issues...

Honestly, I don't care which system we ditch, but we now run both documentation system in parallel for four years and the duplicated maintenance work is really tiring.

Apr 13 '18 13:04 wilzbach

I think this is a really good feature. Going through the google search is really annoying.

May 05 '18 18:05 ghost91-

Ping @CyberShadow - so (apart from the mentioned usability issues) can we go with this simple PR or do we need to fork ddox?

Jun 08 '18 12:06 wilzbach

I don't know. I'm still queasy as the search works unlike on other parts of the website (or other websites), and the robustness of the regex approach.

Jun 08 '18 15:06 CyberShadow

9 months later - the regex trick still works very well:

I'm still queasy as the search works unlike on other parts of the website (or other websites),

Well the ddox doc pages have used this search since 2014.

CC @thewilsonator: what do you think about this?

Jan 12 '19 14:01 wilzbach

This is not really my area of expertise, but sure.

Jan 12 '19 14:01 thewilsonator

Sow what's the plan with Ddox, are we ever going to make that the default site and only have one?

Jan 13 '19 10:01 jacob-carlborg

DDox needs someone to fix the remaining issues (some of which are things lacking in DMD JSON generation), and a long-time maintainer. Currently there is no one to do either.

Jan 13 '19 10:01 CyberShadow

dlang.org dlang.org copied to clipboard

Ddoc: in-site (Ddox-like) search

Bugzilla references

dlang.org
dlang.org copied to clipboard