Add ^O series to handle OSC-8 id= search and URIs
As promised in Issue #248, here is a pull request to really and truly handle OSC 8 links with all its powers, instead of just that 2014 hack to make the mdocmx roff macro references work on the terminal, adjusted to use OSC 8.
Even though just finished an hour ago .. i think it is working fine, with the exception as below.
. ^O^I text - will search for the OSC 8 id= "text". . ^O^N - searches for the next OSC 8 link aka URI. . ^O^P - searches for the previous OSC 8 link aka URI. . ^O^O - opens the currently selected OSC 8 link aka URI with the shell command given in the environment variable LESSOSC8OPEN; it will be passed as a properly quoted single argument. If LESSOSC8OPEN is not set, "man:NAME((.*))?" style links are still understood and opened via man(1).
The good intent of this is reflected by the fact that search.c now has 2442 lines!
So i had prepared a test file with bogus entries and such, and it works very nicely. I for one see only two problems:
. in the status line when you see those ~ lines because we scrolled "past EOF" there will be input artifacts, ie the repainting is somehow not clean, if you ^O.. ^O.. like grazy at least. That is to say that those ~ lines may look like "~^O^N", i do not know why. It is a repainting issue of course.
. i cannot figure out how to configure the search so that for ^O^N and ^O^P do not wrap but get the first / the last link right; for example, in a test file with 34 id=s, i ^O^I34, and (it has a link to, not only an id aka anchor) ^O^N will say "No OSC 8 link found (press RETURN)", but if i then ^O^N again, it will find the links in the last line again, it does not step over it. Likewise for the first anchor, if i ^O^P, then the search wraps backwards and starts at EOF again.
Other than that .. it is fun! Browsing on the terminal, but in a generic way, not like i did in 2014 with mdocmx only for mdoc man(1)uals using crude hacks!!
I hope you like it! ..and a nice Sunday i wish from Germany.
P.S.: that web form seems to not like reverse solidus aka backslash, i do not know how to write the regular expression correctly. Sorry.
Hello. I am looking into this since yesterday, and repainting was really, really bad. It is very much better now locally, but i still have some problems with ^O^N and ^O^P repositioning. I hope i get this fixed until tomorrow, and i think then this pull request had been polished even without voice of "the expert".
I will push it then, and "try to" attach a nice grotty(1) manual file for testing purposes!
Oh, i hope you like it then. Thank you.
With "Add shared static osc8_search_xy(), properly restore search pattern.." i think we now really handle movement and repaint graceful (expensive though).
I still do not use the builtin shell_quote because of our specific needs. This could of course be changed, it would likely look better in almost all cases (when user is prompted to comply visiting external manual etc).
One problem is left, a sole "#" URI without any OSC 8 parameter. When then using ^O^O to follow it, it will not "find the pattern". This could be special cased, but seems unlikely in practice.
Unfortunately i cannot attach the file! Not as tgz gz whatever, they say "We don't support that file type". So i stored the contained converted manual, to be read via "less -RIF", on my vserver as
$ curl -I https://ftp.sdaoden.eu/z.cat.gz HTTP/2 200 content-type: application/gzip etag: "2778712001" last-modified: Fri, 09 Sep 2022 17:18:51 GMT content-length: 6353 accept-ranges: bytes date: Fri, 09 Sep 2022 17:19:11 GMT server: lighttpd/1.4.66
I think this is it now. I hope you consider its inclusion in less. Thank you.
Works flawless for me. It would be very nice if you would at least peep, i am trying to contact you since 2014 on this, or rather the preceding idea in the end.
This seems like a pretty big change for a rather niche feature. Couldn't you just temporary turn off -R if you want to search for text in an OSC8 sequence?
Hm. I would not call it a "niche feature". Your idea is interesting. The OpenBSD people added a tag file feature for their mandoc replacement of man(1), which is more in the direction i was thinking of.
The next release of groff will bring support for OSC-8 generation in grotty(1), and .HR is the new reference command called for man(7) macros ithink. So if converters / manual writers start using it, looking at manuals in less(1) becomes a fully interactive experience. At least you can/could jump in between anchors and follow references with the new ^O series of commands.
Maybe they get the taste of it then, finally. And come back to the idea from 2014 that is realized with the mdocmx(7) extender of the mdoc(7) manual macros used for BSD-style manual pages -- because if you look at what normal UNIX manuals could offer in conjunction with less(1), they would possibly not offer HTML-ified manuals no more (i think NetBSD for example does), because it is simply not needed, you get the full interactive experience as with GNU info (or even better) simply via man(1). Please let me point to three readily prepared "cat" pages that can be simply used within a patched less(1), with no other programs involved (below).
It is .. a big change, but the overall change in between the master and the topic branch via wc(1) -Lwcl is not so large. Ok it is large. But it is not so large given that you pimp it up to a fully interactive user experience. Oh .. my god .. Letterman will invite me!!! Eh, you!
37964 132839 901211 180 total
38554 134699 913950 180 total
Following external references via LESSOSC8OPEN or the builtin manual thing could be stripped, but .. if you get used to it, it is nice: even better than GNU info i think, you can open external manual pages and then come back to where you started. UNIX-like, just quit the new manual process, and come back. (Ie job-control-alike. A bit.)
(Readily prepared manuals, simply for "$ less --RAW-CONTROL-CHARS --ignore-case --no-init FILE"
https://ftp.sdaoden.eu/code-mailx-1.cat
A huge one (~ 1MB) of my mailer, with hundreds of references. And, much smaller, but ditto:
https://ftp.sdaoden.eu/code-mdocmx-7.cat
Of the mdocmx extension of the roff mdoc manual macros.
https://ftp.sdaoden.eu/code-mdocmx-1.cat
Of the preprocessor for it.)
P.S.: i have had to add another commit to the topic branch, "osc8_search_xy(): call search() right or SEGV will happen" (changes the "match" parameter from 0 to 1).
Thank you for your message! I was trying to get in contact with you since 2014. :-)
Ciao and a good and healthy 2023 i wish from Germany!
Sorry to be so noisy. I had seen you dropped K&R for ISO prototypes. (I added four more commits, in order).
Thank you. Ciao and greetings.
@gwsw @sdaoden Any chance this will go in? I actually came here on GitHub to file a feature request to be able to navigate OSC 8 links. I would like to follow a OSC 8 link to another man page, and perhaps get back. That will really make groff and manpages come alive!
Can someone clarify something for me? I thought the idea of OSC-8 links was that the terminal handles them. You click on them or something to navigate to the link. Why does less need to do anything other than pass the sequence to the terminal?
@gwsw OSC-8 has a possibility to change how we interact with man pages, and less is at the end of the pipe for viewing man pages (it's a roff pipe).
The goal of OSC-8 was perhaps just for that: to display a label instead of a complete URL. Clicking on such a link would open another OS window. And to enable that, the initial OSC-8 support focussed on passing OSC-8 sequence to the terminal. That works all well for http[s]:// hyperlinks that are better viewed in a graphical browser like Google Chrome. But, what if the hyperlink is a file:// that is better viewed in terminal? Opening another OS window to view another man page referenced in the current one is too cumbersome.
Why manpage browsing would benefit from OSC-8? Most man pages end with a section See also, like in screenshot below. One could run a simple sed to change these references under See also to OSC-8 sequences. OSC-8 would preserve the original formatting of the document, while providing a link to references. The same concept can be extended to other documentation (not just man pages).
Here's one way it could work (not sure what the PR actually does):
lesswould provide a keybinding to jump between (highlight) OSC-8 links in the document.lesswould provide a keybinding to append current:file_path(lineno)toLESS_LINK_STACKenvironment variable andexecless(or 2a. a user specified command pipe liketbl new_file_path | troff ... | less) on the file mentioned in OSC-8 link.lesswould provide a keybinding to remove lastfile_path(lineno)fromLESS_LINK_STACKenvironment variable andexecless +lineno file_path(and 2a. a user specified command pipetbl file_path | troff ... | less +lineno).
Just these 3 steps will convert our favorite single page navigator less to a multi page navigator.
Perhaps a OSC-8 sequence like self://marker1:ref to jump to another OSC-8 location self://marker1 on a page could also enable hyperlink navigation within the same page.
Note: changing the terminal size will disrupt line numbers, and user won't jump to the same line. Not sure if there is another way to mark a location in a document. Though it's not as big a problem imo...
Hm. Pushing again as i have seen that i re-brought in the get_seg() function that was gone in less version 244 if that look was correct. Also some "constant" stuff (step_char() is not yet constant in comparison to step_charc() which already is; but could; anyhow). I folded all changesets into one.
OSC8 on the terminal, yes. But with this patch you can use O^N and O^P to scroll to these anchors in the text, and with O^I you can search for IDs, and with O^O you can open the "current" anchor (via $LESSOSC8OPEN if it is not handled internally -- then after user confirmation, and not in safe mode, iirc).)
If people realize what they get, it could be they start generating manual pages (or converters which do that, effectively) etc which use this feature. You get a notion as via browsing a HTML document in a text mode browser like lynx.
I still have the manuals available via download. They were generated via my own mdoc macros for UNIX manuals, which the groff maintainer did not accept for upstreaming, but still they show what is possible. Because they base upon upstream mdoc macros, we only can apply [NUMBER] tags to announce anchors/URLs, not prepend them like lynx(1) does, nor simply enwrap the actual "URL text" as is done by graphical browsers. (Ie, "paint it blue", or the like.)
Ie, look at with patched less -RIFe, scroll point-of-interest into view with ^O^I, scroll back with '', just as usual.
Small file: https://ftp.sdaoden.eu/code-mdocmx-1.cat Medium: https://ftp.sdaoden.eu/code-mdocmx-7.cat Large, with many hundreds of links:
https://ftp.sdaoden.eu/code-mailx-1.cat
P.S.: "current" GNU roff / groff generates the needed IDs, finally, and at least.
@sdaoden, I apologize for my lack of responsiveness to this issue. I have been studying your changes and I have one question. Can you explain the purpose of searching for an id value? From my reading, the main purpose of the id parameter is to allow terminal emulators to highlight the appropriate pieces of text on mouseover. Given that purpose, I don't see why a user would have a need to search for an id value.
Sure, Mr. Nudelman.
To reiterate that the "standard" for OSC8 allows key=value pairs, but only defines id=. That year-long unmaintained document describes that a text editor on-the-fly generates some kind of id that can happen multiple times, but should always be identical for the same data, so that the terminal emulator can, for example, highlight all equal-IDs on a screen.
I personally always found that meaning of ID strange, in comparison to UUID, to HTML #id or anchors, etc. To TeX idx (-alike macros). Etc. I have not seen that implemented myself, either.
But if one extends the meaning of that ID to something in-document aka permanent aka non-dynamically generated, to be a real anchor, really like HTML's "a name=ID", then one can use OSC8 to provide the fully interactive experience with in-document anchors and references, too.
This can of course be anything, ie, strings, like a UNIX manual section headline OPTIONS etc. For my mdoc macros i followed the lynx text browser, which enumerates all links in a HTML document. Typing numbers is usually also shorter and less error prone (and case-insensitivity is locale-dependend).
So in short. In addition to the ^O^P and ^O^N, and ^O^O commands, to hop around the OSC-8 references in a document, and open man: directly and the reset via $LESSOSC8OPEN when set, giving the opportunity to search for embedded IDs via ^O^I can give a full interactive experience if the document makes use of it.
I linked example documents which give this fully interactive experience, with table of content, and to-the-point scrolling. For now i do not know any other roff macro package which makes use of this, we will see how the Linux manual maintainer makes use of the new grotty feature to create such references (and KEY=VALUE).
But i know people go to the web to look at HTMLified manual pages with fewest links, to have the actual opportunity, (NetBSD, at least in the past, also installed that locally on request if i recall correctly), even though the manual pages are installed locally and can be made as interactive, or even better, and offline. Again i can only point to the example documents.
Thus i for one would add that ^O^I, speculating it will happen for real. And make use of it in the less manual page, which is pretty lengthy. And preformatted! So that would be very easy to do. The necessary code snippet for the explicit search is quite short. One needs to activate the ID search via shortcut command, otherwise it is not in the way.
P.S.: it also has to be said that instrumenting manual page source code with index anchors allows for really nice interlinked "manual page books". This is especially true for the mdoc language that is natively used on BSDs, because it has semantic and hows about macros, functions, constants, etc etc etc. And these real indexes of mdoc, they can now be referenced to the point. Most projects i know have switched their manual page sources to some other format in order to be able to get semantics, and other, interactive / referencing document formats. So these only need support of their converter, and ^O^I can spring into existence with full power.
@sdaoden Thanks very much for the detailed explanation. When you refer to the "year-long unmaintained document" I assume you're talking about https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda. Is there any kind of formal RFC that specifies the OSC8 behavior other than that page? I'm a little leery about implementing features simply based on you or I "speculating" on how OSC8 might be used in the future. This might lock less into doing something that can't easily be changed due to backwards compatibility, but is in conflict with what actually happens in the future.
I do not think there is anything else on the internet. I think he is the originator of the OSC8 feature you have implemented in less(1).
To reiterate that ^O^[NPO] is unaffected of your doubts.
Regarding ^O^I alone i find your statement a bit strong.
Hm, i mean .. one could think of extending this even, you know, in making it regular, maybe naming it ^O/ instead (^O^S will not do), so that any user input is searched for only in OSC8 content, VALUEs of any given KEY=, as well as the link itself? You know, a regular search in OSC8 constructs that covers any [KEY=]VALUE and link content, so that if i do for example "^O^I"less it will find the "next" OSC8 anchor/link which refers to "less"? That strikes me as a good thing totally apart of any possible doubts on the adaption of "id="UUID? How about this?
.. this would be an "any substring matches" search then, affected by the usual less semantics of case-sensitive. I had to look on how to implement this, then.
(P.S.: to clarify: the one who "picked it up" is on the GNU groff mailing-list for some years.)
Thanks, your code has been very useful for reference, although I've done the implementation somewhat differently. I have implemented your ^O^N, ^O^P and ^O^O commands. The ^O^O command does open the #-style links you have in your example files, and also file: links which I think your code didn't do. It also allows the user to use environment variables to configure how to open links.
I have not yet implemented a search feature like your ^O^I. Given that the URI and id= parameters are not visible to the user, I'm still not convinced that a user would often want to search for them. How would he even know what id values are used in the file? Anyway, the user could always toggle -R and search for them normally. It can be added later if it turns out to be useful.
Any feedback would be appreciated.
Hello! That is wonderful news!! (Of course i am .. but only a bit.) I will try to look into this today, only saw the commitdiff until now; quite a bit different :-) Thank you!
^O^L does not seem to do what is documented (anything, in fact). But ^O^O does follow in-document #anchor links, so there i went.
I find the LESS_OSC8_xxx thing a bit excessive compared to my single "muxer" approach; in general i have the impression that "muxing" is preferred "by the people", but then again your omnipresent program has a multi-decade long history with its LESSOPEN mechanism, so if you prefer a different one now, then surely for a reason.
The manual says LESS_OSC8_xxx but has LESS_OSC_man and LESS_OSC_file typos.
I am very happy that you provide a built-in LESS_OSC8_man -- this is very, very cool, and i really look forward that manual pages make use of this feature! It is, despite for the internal #anchor links, what i was hoping for since 2014! You get a completely interactive UNIX manual session simply by following a link, on a normal terminal! Yes.
Ah wait. If i follow an OSC-8 #anchor via ^O^O, and then ^O^L, we scroll back to the OSC-8 which was followed. Hm. Well less(1) has '' to jump back where you were last, but you surely had a use case idea.
For me on the dark side is that with that lynx-alike manual page extension i have written, which provides that "numbered link view" that lynx invented, as far as i know, i now have to count links on the current screen, to N^O^N there to ^O^O thereafter, but, well, it is a specific thing. Then again, without such visualization, the "N" of N^O^N is hardly known. But i agree you are right when you say that support for that is not insight, and rather people will, like mandoc(1) does when converting UNIX manuals to HTML, create anchors like "#INPUT_PREPROCESSOR" (or even "#INPUT PREPROCESSOR") than anything else.
^O^N and ^O^P do not "wrap around" from the bottom to the top and vice versa. Which i got used to. Likely personal taste only.
Dear Mr. Nudelman, thanks for bringing this in. Now i hope for people creating an almost (plain) webpage-alike UNIX manual experience. Or even normal reading in textualized PDFs.
@sdaoden thanks for your comments.
Regarding LESS_OSC8_xxx, my thinking is someone might want to change, for example, their browser (which opens https: links), or their file viewer (which opens file: links) but in general when they do something like that, they want to leave handling of other types of links unchanged. It seems easier to manage this if the handler for each type of link is separate, rather than having to edit a muxer script that handles many different types of links. You could of course set several LESS_OSC8_xxx variables to point to the same script. But perhaps it would be convenient to have a LESS_OSC8_all or something like that, which would handle all link types that don't have a specific handler.
^O^L was a late addition, and I'm not positive it's really needed, but sometimes I would follow an internal link, then navigate around for a while, then want to return to the original link. The navigation may have overwritten the place where '' (quote-quote) would return, so ^O^L seemed somewhat useful.
I agree counting links in order to supply a number to ^O^N is not ideal. I'm thinking of allowing link navigation by just clicking on the link with the mouse, when --mouse is enabled.
Hello. ^O^L i simply have misunderstood it seems. I think if a fresh mind reads this she or he will get that right immediately. I .. was using local modifications for almost ten years and got used to my flow.
I can absolutely follow and agree with your thoughts on the individual handlers. (I have never used lessopen myself; i think somewhen "decades ago" FreeBSD activated lesspipe generically, but this is only dim memory -- installed it still is.)
Mouse on the terminal i have not used for long (i disable anything but select and paste in the st(1) i use), so i cannot comment on that. But if you think of such, maybe a shortcut to highlight all OSC8 links (not anchors) in (the visible part of) a document would be of interest?
Btw i now (sorry for my short breath, i have to finish a DKIM thing until February 1st before Google requires it, and just today finished the RFC 5322 parser) see that ^O^O seems to perform work on an OSC-8 anchor, ie, an OSC8 link without URI target. I think it would be better if those anchors-only things would simply be skipped; iirc my patch did so. (I have switched to your regular implementation!)
I said
But if you think of such, maybe a shortcut to highlight all OSC8 links (not anchors) in (the visible part of) a document would be of interest?
If you want to spend real time that "highlight all OSC8 URIs on the screen" mode could actually glue a generated "N" to the URI and highlight only that. That is a lot of work, but then, at least on the current screen, less(1) would act almost like lynx(1) does. That is, then, i "highlight all OSC8 links on the screen", and then can do an exact "N"^O^N because i do not have to count, plus ^O^O to go where i want! Of course doing this will reflow the entire screen.
^O^O seems to perform work on an OSC-8 anchor, ie, an OSC8 link without URI target. I think it would be better if those anchors-only things would simply be skipped; iirc my patch did so.
Can you give a specific example of this? As far as I know, ^O^N ignores OSC8 sequences where the URI or the marked text is empty. So ^O^O shouldn't operate on them if they aren't selected by ^O^N. For example, in your code-mailx-1.cat file, ^O^N at the beginning of the file goes to the NAME link on line 24, and ignores the three anchors on lines 3, 6 and 23.
Hello! No, it does not, actually. The files i offer are all non-debug, but the mdocmx macros have a debug mode so that you can see anchors. They are still only anchors, but there is text in between the opening OSC8 sequence with URI (and ID), and the closing empty one.
The difference in between my patch and yours is that your ^O^N interprets these anchor-only OSC8 sequences as links. I want to remark that i define "anchor-only" those which have no URI in the sequence, as opposed to no-link-text. Unfortunately there has to be some link-target, i use "#" number sign, because the groff maintainer implemented the grotty command that way; empty URI, even only two parenthesis (last i tried) does not work out, which is very, very unfortunate.
So i used a plain number-sign "#" as an "anchor URI".
Ah, this works! Minimal examples these are, the ndebug variant has only one real link, and with my version the debug variant has, too, whereas with yours you will have three.
Yes, i did not think about my environment before posting my complain. But i want to note that quite often IDs / anchors are applied to non-empty text. This is not true for the plain mdoc(mx) aka roff input that produced the examples, as there placed the in-text anchors consciously "somewhere in the middle of a paragraph" to make the scrolling via man(1) catch up correctly (as of experience). (I can show examples in the mailer's manual upon interest.)
But in HTML or other such formats an ID or anchor is often attached to a for example or other such element, which then has textual content (or an , for example, but also, in the case i want that landing at the anchor gets not only the image but the surrounding paragraph).
In that scenario, if an automatic converter (shall any be updated to support such OSC8 links) creates plain text with an anchor, your approach could practically highlight an entire paragraph.
Well, at least possibly.
It is very unfortunate that the groff maintainer implemented the thing the way he did.
P.S.: but you can see, that ^O^N finds the "anchor-only", and ^O^O then searches and says "OSC8 link not found". And i think there needs to exist a possibility for an "anchor-only". The OSC8 spec surely allows that.
P.P.S.: eh! Of course this browser-thing mis-interpreted the things, i had written HTML SPAN and IMG markup tags.