url icon indicating copy to clipboard operation
url copied to clipboard

Add a neato informative table of various URL pieces

Open domenic opened this issue 7 years ago • 65 comments

Basically copy the bottom half of this: https://nodejs.org/api/url.html#url_url_strings_and_url_objects

(We could presumably SVG-ize it so it's a little prettier.)

Via the thread at https://twitter.com/wa7son/status/886982643463708673

domenic avatar Jul 17 '17 20:07 domenic

Quick WIP with Inkscape:

drawing svg

Would something like this work?

TimothyGu avatar Sep 05 '17 02:09 TimothyGu

I think so, although maybe a table is better as that would be more accessible I suspect. Note also that ? is not part of the pathname getter. The other thing that might be interesting is to illustrate a couple different URLs. In particular different schemes. You also omitted the origin field although that's rather hard given that it needs to skip user/pass somehow.

annevk avatar Sep 05 '17 07:09 annevk

Note also that ? is not part of the pathname getter.

Oops, an off-by-one.

In particular different schemes

That does indeed sound like a nice idea.

You also omitted the origin field

I did so intentionally, as it's not a concept intrinsically related to URL parsing, but rather more about Web apps/security. And because it's hard.


I did a table version with a few variants. The first is a straight translation of the SVG graph. The second is closer to the version in the Node.js doc. The third is the same as the second but has origin, mainly there to show how ugly it is. The forth is a URN for fun. I do like the fact that you can link to the spec for that exact attribute, but with some coloring I still think the SVG one is a bit prettier.

screenshot from 2017-09-05 16-07-49

TimothyGu avatar Sep 05 '17 08:09 TimothyGu

Alright, I'm game. We should be able to get the links to work with SVG too. I'm not really sure if we can make all of it equally accessible though.

annevk avatar Sep 05 '17 08:09 annevk

Personally I like the first table one, possibly with additional text-align center.

I also think it might be interesting to have a counterpart table that is about the URL record terms, instead of the API? (E.g. scheme instead of protocol, query instead of search, fragment instead of hash.) Maybe that wouldn't be that helpful though.

domenic avatar Sep 05 '17 15:09 domenic

It's probably useful as there are some interesting differences between the two. Bit unclear where the table should be located at that point, but maybe we could put it in an Appendix?

annevk avatar Sep 05 '17 15:09 annevk

What is the status of this issue? The issue that I submitted today, "Documentation on URL syntax", has been closed and deferred to this issue, which has been in play for over a year. In the meantime, the only documentation I've found that lays out URL syntax is the series of steps in section 4.5, which requires some figuring out to understand. So are we going to add something to fix that?

EnnexMB avatar Aug 19 '18 17:08 EnnexMB

@EnnexMB it basically needs someone to work on it and resolve the open questions above.

annevk avatar Aug 20 '18 07:08 annevk

Ok, can I help with this? It seems that the two open questions are:

  • What to be used to represent the syntax (graphic, table, or formula). I suggest the second or third of the four tables posted by TimothyGu, but:
    • It should be in text, not a graphic, with links on the terms. The code Timothy used to generate the graphic would be helpful.
    • The terms should match those used in the URL spec, although it would be good to point out the parallel API terms and link to their document, and it would be best if the analogous table is posted there as well.
    • It would be helpful to have an additional row at the top of the table with a symbol to distinguish required from optional elements. In the formula proposed in my issue yesterday, this was done with square brackets. Timothy's table is much easier to read, but the optionality is important information to include.
    • I guess the third table is best if "origin" means something, which I don't know. Linking to where it's explained in the spec would fix that. It appears twice, so some clarification is needed.
    • I don't know about URNs, but if Timothy's fourth table is related to URLs, then it would be good to show that relationship as well.
  • Where to put it. I suggest section 4.5 unless there is a more appropriate place. If it's put in an appendix, it should be prominently linked to, and maybe should be linked to in a few places anyway, because this is going to be something that will be useful to people.

What have I left out? If Timothy will post the code behind his graphic, I'll work on editing it to implement the suggestions above.

EnnexMB avatar Aug 20 '18 10:08 EnnexMB

Sounds good, thanks. What do you think @TimothyGu?

As for placement, the top of section 4 might also work, given that it illustrates the relationship between various subsections.

annevk avatar Aug 20 '18 11:08 annevk

@EnnexMB Thanks for your interest in this.

The WIP are unfortunately on my laptop that has seen some physical damage since the time I created them. I'll try to recover the files tonight.

The code Timothy used to generate the graphic would be helpful.

For the first (https://user-images.githubusercontent.com/1538624/30042227-a0f3af64-9222-11e7-96a4-39c0cf11d279.png) it was just a manually created SVG. For the second it was a pretty standard HTML table with the spec's default styling.

It would be helpful to have an additional row at the top of the table with a symbol to distinguish required from optional elements.

NB: what's optional is quite different for different URL schemes. The URN at the end is a good indicator of that. In fact, for non-special URLs only the scheme is required and nothing else – tim: is a valid URL! It's important to be mindful of that.

I guess the third table is best if "origin" means something, which I don't know.

I'd be okay removing that. It's not really a component of the URL but rather a byproduct, so may not fit in that table.

TimothyGu avatar Aug 22 '18 17:08 TimothyGu

Hi @TimothyGu, were you able to recover the file? I don't think we need the first one, since it seems to be superseded by the second, the table version. Standard HTML is fine, and it could help to start from the structure you've already created, rather than starting from scratch. Of course, if you want to move forward with the changes yourself, that would be great too. But I'd be happy to do it if that would help.

I understand that optionality is complex. I had in mind to devise some compact way to represent it in the first row of the table. You (and others) might want to take a look at the formula in my original post and see if you agree with the optionality as represented there by square brackets. (I just now edited with a correction.) It does have everything optional except scheme:. I wrote that formula entirely based on the serializing instructions in section 4.5.

EnnexMB avatar Aug 24 '18 15:08 EnnexMB

@TimothyGu, any luck getting that file? I really think that one way or another we should get this done.

EnnexMB avatar Aug 31 '18 11:08 EnnexMB

@EnnexMB Sorry about the delay, but yes! Here's the diff for the table version:

https://gist.github.com/5eb111b5021b338d516e97225a65bed4

Here's the SVG if you're interested. Note the search coverage is still wrong.

https://gist.github.com/bf539f420463bab1eb7426cff267a5b4

(drawing2.svg have the fonts embedded)

Please go ahead and work on it. I won't be able to do so myself and I really appreciate your stepping up.

TimothyGu avatar Sep 02 '18 13:09 TimothyGu

Thank you @TimothyGu. I need some help with the format of the file in the first link. Can someone send me a link to documentation on the diff format used there? I Googled "diff file" and don't see anything relevant.

EnnexMB avatar Sep 07 '18 11:09 EnnexMB

I found https://www.thegeekstuff.com/2014/12/patch-command-examples/. The document being patched is the source file for the URL Standard by the way, url.bs.

annevk avatar Sep 07 '18 12:09 annevk

@EnnexMB Oops, I’m sorry to have missed your comment on the gist itself. What @annevk gave should work, though I would personally do this:

  1. Put the gist file in a file, let’s call it tmp.diff
  2. Apply it using git apply tmp.diff.

git apply has several advantages over patch and is usually much easier to use, so I’d recommend that for diffs with Git headers like the one I provided.

TimothyGu avatar Sep 07 '18 13:09 TimothyGu

Okay, I'm sorry, but I still need a bit more help here.

I think the problem is that this all started when I was reading the URL standard and posted an issue about it, which landed me here in GitHub, but I have no experience in GitHub. So when I'm told to use git apply tmp.diff, I don't know what environment I'm supposed to be in to do that.

I Googled git apply and found what appears to be documentation of that command, and from there, of git itself, which appears to be software that I need to install on my computer in order to proceed with this. Is that correct, or is there a way to work with that diff file online without installing software?

Sorry to distract from the thread topic by needing some guidance.

EnnexMB avatar Sep 07 '18 15:09 EnnexMB

It's for the command line, e.g., the Terminmal application on macOS. And yeah, you'd need to have such tooling installed (for macOS you'll get prompted to install it). To help you, I applied the diff to url.bs and copied the result to https://html5.org/temp/url.bs.

annevk avatar Sep 10 '18 07:09 annevk

Edit, Sept. 15: Disregard this post, and see my next one below.


Okay, thank you. That gave me a helpful starting point.

I don't know how to include HTML in this post, so I've inserted two images of what I've done and then after those images, I provide a link to the HTML file that generated both of them.

Here is @TimothyGu's third table with the changes I suggested and some additional changes: url syntax representation- original table proposal modified The complete list of changes from his original table is documented in the HTML file linked below. Also, in that HTML file, the red, underlined text is working links.

In addition, I've done some further work to present an alternative proposal, which has three parts.

  • Formulaic representation: I think this is worth including because it uses the standard system of square brackets to represent optional elements and curly brackets with a vertical line to represent a set of elements to select from. Also, referring to it can assist in understanding the meaning of the graphical representation below it.
  • Graphical representation: This uses different colors to represent optional elements, with a gradation of lighter colors to represent elements that are optional within other optional elements, and adjacent elements in the same color to represent mutually exclusive choices. The information content is the same as in the formulaic representation, but it is easier for a human to read.
  • Table of element conditions: This summarizes the rules in section 4.5. "URL serializing" of the standard. Again, referring to this table can make it easier to understand both the formulaic and graphical representations.

In the following image, the underlined text is working links in the HTML file linked further below.

url syntax representation- new proposal

The two images above were generated in an HTML file using the same CSS as the URL standard. However, that didn't handle conversion of the double-brace wrappers used in @TimothyGu's code, so I converted those to <code> tags. (I'd be very interested in knowing how to use those double-brace wrappers if someone could direct me to information on that.)

The HTML file is posted at Gist, and I don't see a way to link to it so it can be read directly by your browser. So to see it as intended, you will have to copy it into your own htm file and view it in your browser from there. If someone will tell me a better way to do this in the future, I will do that.

EnnexMB avatar Sep 11 '18 20:09 EnnexMB

Alright, hold on a second. Disregard my previous post from a few days ago. I was just reading up on CSS syntax and in sections 4.1 and 5.1 came upon railroad diagrams. It's a far better way to represent syntax than my home-spun graphical representation above. I found a website for generating them, and here is the result for URLs:

url syntax railroad diagram

Along with that graphic, there is an htm file that shows that diagram with links on the element names to the relevant sections of the URL Standard, along with another representation of the syntax in EBNF notation, which is the code used to generate the diagram.

As above, the htm file is saved as a Gist, and I wish I knew a way to post it so it would load directly in your browser, but I don't.

From my previous post, the table of element conditions might still be useful. I'd say disregard all the rest.

EnnexMB avatar Sep 15 '18 20:09 EnnexMB

See #24 on some previous work done on creating a formal grammar for URLs, perhaps displayed through railroad diagrams (see http://intertwingly.net/stories/2014/10/20/Url.xhtml). In my opinion, RR diagrams and formal grammar solve a different problem, and a version of what I had should be enough just for a simple overview of URLs, which is what this bug is all about.

TimothyGu avatar Sep 15 '18 20:09 TimothyGu

The RR diagrams you linked to are very complex and, as you say, solve a different problem than we are discussing here. The RR diagram I posted is very simple and contains the same information as in your table plus information on optionality of elements. Do you have an idea of how to convey that optionality information in your table? That was what I was getting at with the graphical representation, but I think the RR diagram does it much better.

Whether we use the RR diagram or a version of that table or something else, I would like to suggest that this issue be brought to a conclusion by posting something in the standard to give readers and easy way to understand the syntax of URLs.

EnnexMB avatar Sep 15 '18 21:09 EnnexMB

a version of what I had should be enough just for a simple overview of URLs

It seems like the simple RR diagram in https://github.com/whatwg/url/issues/337#issuecomment-421627310 hits the sweet spot pretty well. To me at least, it’s more user/reader-friendly than either the table approach or the https://user-images.githubusercontent.com/1538624/30042227-a0f3af64-9222-11e7-96a4-39c0cf11d279.png approach

sideshowbarker avatar Sep 15 '18 23:09 sideshowbarker

As @annevk mentioned, there is a complete railroad diagram for the URL specification as it existed four years ago. It even was testable, produced a reference implementation, and passed all of the (valid) tests at the time. This was even merged into the spec (the stylesheet is now gone, but you can get the idea). If this is an idea whose time has come, I can help.

rubys avatar Sep 16 '18 15:09 rubys

Hey, those railroad diagrams in that old version of the spec (linked under "merged into the spec" above) are great. Why were they taken out? If that's a long story, we don't need to go into, but the important question is, can we get them put back in?

If the problem is that changes in the spec made the diagrams invalid and it was too much work to update all those detailed diagrams, I understand that. But if that's the case, can we, instead of leaving the diagrams out entirely, put in a summary diagram like the one I posted above, so the reader at least has something to help them understand the syntax?

The summary diagram I posted above is roughly equivalent to the combination of diagrams at the following locations in the old spec:

One problem with my summary diagram that I see by looking at the old spec is that my diagram does not cover the case of relative URLs. The reason for this is that I built that diagram based on the rules in section 4.5. URL serializing. I suppose that may have been an error on my part and I should have used the slightly more complex rules in section 4.3. URL writing. Is there a reason the serialization rules don't include relative URLs?

If there is a decision to use a summary railroad diagram like the one I posted above, then I will correct it to cover relative URLs by recasting it from the rules in section 4.3 or any other set of rules that are the right rules to base the diagram on. On the other hand, it would be even better to take up @rubys's offer to update the more detailed diagrams (if that's what he's offering to do). Best solution would be to include both the detailed diagrams and a summary one.

EnnexMB avatar Sep 16 '18 17:09 EnnexMB

If the problem is that changes in the spec made the diagrams invalid and it was too much work to update all those detailed diagrams, I understand that.

I hit that same problem while working on PR #416. I was unable to find a complete list of what has changed since RFC3986. Is there an official log of the changes between versions/dates (other than having to dig through the git commits)? Without a change log, it's hard for implementors to find out if their implementation is still up-to-date with the spec.

sjamaan avatar Sep 17 '18 08:09 sjamaan

RFC 3986 wasn't exactly used as a base, so there's no detailed changelog relative to that.

annevk avatar Sep 17 '18 08:09 annevk

What was used as a base then?

sjamaan avatar Sep 17 '18 08:09 sjamaan

Reverse engineering implementations through adhoc testing that became increasingly more rigorous.

annevk avatar Sep 17 '18 08:09 annevk