devdocs.el Add devdocs-grep command

Feb 20 '22 16:02 astoff

@minad If you want to discuss more about the grep command, we can do it here.

The current version works, but is synchronous completely. It should be possible to add some degree of asynchronicity using timers. There's also the async package, but I'm not sure I want to depend on it.

As to the possibility of pre-rendering the HTML files at installation time, is it even possible to serialize a buffer with all its text properties? I would also have to worry about invalidating the pre-rendered pages when some shr customization changes, so this look quite complicated in the end.

Finally, leaving speed considerations aside for a moment: is the possibility of a Consult integration precluded by doing things in Lisp? Can you specify an async source from a normal buffer instead of a process buffer?

Feb 20 '22 16:02 astoff

The current version works, but is synchronous completely. It should be possible to add some degree of asynchronicity using timers. There's also the async package, but I'm not sure I want to depend on it.

My proposal would rather be to provide different pluggable frontends here, such that the user can plug in consult-grep in their init.el. The default implementation could be based on the default grep. Would something like this work? I would rather not reimplement your own asynchronous grep here, then you would just unnecessarily duplicate the work done in consult-grep etc. I had imagined a very simple integration:

Prerender the text, keep it in a directory separate from the html
Introduce a devdocs-grep command which calls a devdocs-grep-function within the devdocs directory. The function should return the selected file.

As to the possibility of pre-rendering the HTML files at installation time, is it even possible to serialize a buffer with all its text properties? I would also have to worry about invalidating the pre-rendered pages when some shr customization changes, so this look quite complicated in the end.

Then we cannot use grep anymore?

Finally, leaving speed considerations aside for a moment: is the possibility of a Consult integration precluded by doing things in Lisp? Can you specify an async source from a normal buffer instead of a process buffer?

Yes, we cannot scan buffers asynchronously.

Feb 20 '22 16:02 minad

Ah okay, now I looked at your code. This is different than what I had in mind. If it works well, why not? But maybe my idea is worth exploring too? With external grep, consult-grep we could enjoy the asynchronicity and with consult the live updating search.

Feb 20 '22 16:02 minad

If it works well, why not?

It works correctly, but is super slow. By the time you start grepping a well-indexed document, you are pretty desperate, so it might be acceptable, though. In any case, I'm not going to merge this right away.

Feb 20 '22 16:02 astoff

It works correctly, but is super slow. By the time you start grepping a well-indexed document, you are pretty desperate, so it might be acceptable, though. In any case, I'm not going to merge this right away.

Okay, my opinion is that this should not be added. I don't see a point in having slow commands around in particular if we have better tools like ripgrep. This leaves us with either the isearch solution or the solution I proposed above with the pregenerated text files.

Feb 20 '22 17:02 minad

I've added a new commit making the search asynchronous (but still single threaded). You might be curious to check out how it works :-).

Feb 20 '22 17:02 astoff

So the results will come in live. That's nice. But I would still go with another solution. I am not sure if devdocs should invent its own search command, given that we have good alternatives. I am all for decoupling packages and having clear responsibilities. But maybe you can make this search command so much better by specializing it to devdocs, such that it will be worth it.

Feb 20 '22 17:02 minad

I don't see a point in having slow commands around in particular if we have better tools like ripgrep.

Can you ripgrep HTML files without seeing garbage in the output? I'm not even sure all the documents have meaningful line breaks! One can use a pipe pandoc --to plain | grep, but then the reported line numbers are not meaningful, and I'm not sure how to deal with that (also, even on Fedora there doesn't seem to be any html-to-text command installed by default).

(As a desperate measure, I could grep the raw HTML to decide if a given document page contains matches, and only then do the full-blown shr thing.)

Feb 20 '22 17:02 astoff

Can you ripgrep HTML files without seeing garbage in the output?

I believe there is a ripgrep wrapper which can do that, also search pdfs etc. But I already proposed a better solution - pregenerate the text files with shr. What is wrong with that? You can either pregenerate when the docs are fetched or when the search is started for the first time. But I would probably do it after the fetch, since you probably already do some post processing of the fetched data?

Feb 20 '22 17:02 minad

What is wrong with that?

Mainly two things: one, the line numbers will get out of sync if you change your fonts or any other shr config; and installing docs will take a lot longer (perhaps 5 minutes for OpenJDK!)

Anyway, I will sleep on this for some time :-).

Feb 20 '22 17:02 astoff