markup icon indicating copy to clipboard operation
markup copied to clipboard

Asciidoctor: support include directives for other asciidoc files

Open miltador opened this issue 6 years ago • 115 comments

A lot of time has passed since latest comments from devs in #172 and #335. I think after years things have changed in your infrastructure, why not to bring up this issue again with some more input and news?

For example, AFAIK GitHub started to use containers which could help to isolate things from unintentional access and improve security.

Also take a look at asciidoctor/asciidoctor#1088. The author of Asciidoctor is ready for a conversation about this, there is even proposed a way to implement this with custom include preprocessor. If there are still some concerns about resolving the issue, please provide a constructive feedback so both sides can collaborate.

miltador avatar Aug 21 '17 08:08 miltador

I totally 👍 this request. Includes are an amazing feature of Asciidoc over Markdown and would help a lot providing users with good and up-to-date documentation.

I would add image to the same bucket. I think it is not supported today, probably for the same reason includes are not.

chevdor avatar Oct 18 '17 07:10 chevdor

I would add image to the same bucket. I think it is not supported today, probably for the same reason includes are not.

The image macros are supported on GitHub. You can see them in action here: https://github.com/asciidoctor/atom-asciidoc-preview

mojavelinux avatar Oct 25 '17 05:10 mojavelinux

If GitHub is committed to enabling includes for AsciiDoc files, I'm willing to do whatever needs to be done in Asciidoctor to make it happen. I've already added an extension point so the include directive can be handled by a custom function. That allows the GitHub code to resolve the file from the git repository instead of from the file system.

mojavelinux avatar Oct 25 '17 05:10 mojavelinux

Fully support this. Seems like a basic capability and we're struggling without it 👎 http://asciidoctor.org/docs/user-manual/#include-directive

ericis avatar Nov 30 '17 23:11 ericis

it will be great to use a full power of asciidoc from github frontend. For example I use restdocs & spring to generate snippets for documentation of my API. Most of documentations is handwritten and only generated snippets was included. I want to store this documentation in github and collaboratively edit it with others via github instruments(pull requests).

nailgilaziev avatar Feb 11 '18 19:02 nailgilaziev

This issue was an unpleasant surprise. I agree with @ericis; this is fundamental functionality which is expected by Asciidoc users.

Since @mojavelinux has enabled custom handling , this seems simple:

  1. check for infinite recursion
  2. only allow relative includes. (i.e., only include files within the repository.)

This seems fairly solvable. Maybe I'm not seeing the full extent of it?

kavaliro avatar Feb 20 '18 20:02 kavaliro

I think it's been a solvable problem for a while now, technically-- at this point it's just someone flipping a couple switches by the looks of it.

This is hands down one of the best features of AsciiDoc and it's a crying shame that there is so little support for it in git-related tooling (GH, Gollum, GitBookIO, etc.).

FWIW, it's not like images are that much safer than other types of content, but for years now I could generate images of my source code and include it that way (ridiculous, as no copy/paste), or generate the content itself and include it that way (which defeats the purpose of nice rendering via GH out of the box).

Someday, it would be nice to move into the brave new future and get some include::sourceFile[tags=tagName] type stuff going... I know not many people like writing docs but this stuff is gold for folks who-- well, may not love writing them, but at least who want to write them as smartly as possible-- and right now those folks have to do the work twice to get it to look nice, basically, and while it can be automated... it shouldn't need to be, in this day and age! :smiley:

denuno avatar Feb 24 '18 18:02 denuno

So it's pretty clear the reason it hasn't been implemented yet is security. I believe it is also a reason why GitHub's staff doesn't really engage with the community in order to solve that problem because it would require them to expose their threat model and pieces of their architecture which could help attackers.

FWIW, it's not like images are that much safer than other types of content, but for years now I could generate images of my source code and include it that way (ridiculous, as no copy/paste), or generate the content itself and include it that way (which defeats the purpose of nice rendering via GH out of the box).

You are missing the broader point. Include processing is more complex, not because the destination is dangerous, but because of the way it interacts. Just as an example: An include can include includes which could include the first top-level include. This would DoS a naive parser. Analogous to the Billion Laugh Attack and it could be done in so many ways that it's hard to prevent them all with parser restrictions. Another different, yet complex to fix, example: AsciiDoc includes can be URIs, URI includes could be used to perform SSRF attacks in order to explore GitHub's infrastructure in means that shouldn't be allowed.

So I think the reason why GitHub is not doing it is because <1% of their users care and there are numerous ways to abuse the feature so doing it safely is hard. Implementing a hard feature for <1% is not worth the resources (or the risk).

Don't shoot the messenger here: I want AsciiDoc includes to work on GitHub! It's just that I can see why it's a tricky one to implement.

obilodeau avatar Feb 25 '18 03:02 obilodeau

Nothing wrong with sharing how you think-- no shots here, but I disagree that it's security holding this up per se. :smiley:

Mainly because in the comments above, both injection attacks, and recursion have solutions (and have been solved elsewhere in github infrastructure already)... it's weird that infinite recursion is even on the list, IMHO-- same with the "naive" parser idea... anyone can write a bad parser, we can't protect against that (and I'd be scared if we could!).

Same goes for the security concerns you bring up. They've been brought up, and even addressed, already. If not in this thread/ticket, than in multiple other ones, which leads me to my last refutation, that being the idea this problem affects a negligible number of people.

It seems to be effecting many people (including authors of popular tools who are representing thousands of people), as you can tell by the various tickets around this issue. The same-ish issue (for MD) has the highest number of comments at the moment, and it's just a bump of several other tickets (all of which seem to have "solved" the problems raised but appear to be in a holding pattern). Plus figure a good 85% of people probably don't file a ticket when they hit a problem...

I'll close by saying that security through obscurity isn't really much protection. There's a reason crypto libs have to be published to be valid, and that's because obscurity is "soft" protection at best, and a huge security concern at worst. It's cool GH is sharing the markup plugin-- tho note numbers 2-5 from the docs ("internal code") seem to be a source of bottlenecks.

Pretty sure The Future will be mostly folks who put source code out so it can be verified/checked (been saying this for years, but at least it doesn't sound crazy anymore... how much does OSS power these days?), and if you don't think someone can just call up their friend who works at Large Company X to get a peek at sources... ¯_(ツ)_/¯ anyhow, at this point it's not a tech barrier as far as I can see, and it's fine if something else is holding this up, but there's been little to no feedback from project maintainers, and it's depressing to see so much input and so many orphaned/ghosted PRs going back years and whatnot (in general around this issue). But not that much. :smiley: Mainly I'm just tossing in another +1, in a now age-old tradition. :stuck_out_tongue_winking_eye:

denuno avatar Feb 25 '18 20:02 denuno

@obilodeau Thank you for sharing your insight on this issue. You provided a lot of context for what the security concern actually entails.

I want to emphasize two things. First, Asciidoctor has from the very beginning offered up the include processor as an extension point (for GitHub, I might add). That means that GitHub could write code that takes over processing of the include, so there's 0 risk of insecure code that they themselves did not introduce. That doesn't make it easy, to be sure, but that should mitigate the third-party risk argument.

Second, I'm willing to make any reasonable change in Asciidoctor that would further accommodate this feature. So far, I have heard absolutely nothing from GitHub on this issue, which doesn't bode well if we expect to make any progress. My offer stands to work with GitHub if they need me to get this feature implemented. (code, documentation, whatever)

Olivier wrote: An include can include includes which could include the first top-level include.

I don't expect GitHub to use the built-in include processor, but there is protection against this scenario. An include can only go to a fixed number of levels before it's terminated. Most users would be happy with 3-5 levels, if even that.

Olivier wrote: AsciiDoc includes can be URIs, URI includes could be used to perform SSRF attacks in order to explore GitHub's infrastructure in means that shouldn't be allowed.

I would not except these to work on GitHub. They don't even work on Asciidoctor without a command-line / API flag. I think includes that work within the set of repository files is already a huge boon for us AsciiDoc writers.

Olivier wrote: Implementing a hard feature for <1% is not worth the resources (or the risk).

If they went with that argument, then I'd say they don't know the AsciiDoc community very well. AsciiDoc users are bonkers for includes, rightfully so. As Denny points out:

It seems to be effecting many people (including authors of popular tools who are representing thousands of people), as you can tell by the various tickets around this issue.

If you want my prediction on this issue, it will be resolved as soon as GitLab implements it. That's not trolling, that's just sound competition in the marketplace. After all, GitLab just implemented stem support (i.e, math expressions) in AsciiDoc files.

mojavelinux avatar Feb 27 '18 11:02 mojavelinux

Btw, here's a link to corresponding issue in the GitLab issue tracker.

https://gitlab.com/gitlab-org/gitlab-ce/issues/18045

Interesting to note that @jirutka has already submitted a full patch for a custom include processor that works with a repository manager like GitHub and GitLab. So the code is there. We just need the will.

mojavelinux avatar Feb 27 '18 11:02 mojavelinux

I am fine with a very restrictive policy here, even restricting includes to the same repository. My issue is that I have documents that I want modularized but I also want to be able to present them on the web as complete documents and right now I can't do that. I have to kick off a separate process to generate the file then post that. Whether I do this manually or through some sort of agent is irrelevant, it is a nuisance and completely unnecessary.

jyutzler avatar Feb 27 '18 19:02 jyutzler

3-5 levels would be perfect imho. Five gives a few levels for document structure and a couple of levels for templates. Three would suffice for most use cases, but five would handle almost all of them.

While I can handle all of the includes before pushing the content to github, effectively that means I'm no longer able to use github as source control (for the same reason our .gitignore files exclude /bin.)

One use case I have uses include to add an svg, and the text fields of the svg are then changed via substitutions. Which really isn't that complex--an svg is just xml after all--but it's not something you can do with the standard image handler.

Count me among those who are bonkers for includes. Includes really are a must have feature for asciidoc.

kavaliro avatar Feb 28 '18 15:02 kavaliro

Heck, I'd settle for even one level-- and a same-repo policy is what I'd expect off the top of my head so that'd be dandy too (though it'd be swell if they follow the standard GH route of an inter-github policy for linking to stuff in other GH user/org repos).

While I can handle all of the includes before pushing the content to github, effectively that means I'm no longer able to use github as source control (for the same reason our .gitignore files exclude /bin.)

:point_up: this!

One of the main reasons I like GH is that slickly rendered README. It is powerful feature. Without includes it's hobbled in exactly the way you wouldn't want though, as outlined above (making it a non-option for folks who have organized docs in their sources ("Don't look at those, look at the generated ones! (but yes, edit those)" yuck).

GH has had includes of various types (at least header/footer) for years now in the wiki section, so it's a problem that was solved for GH in some general form ages ago...

denuno avatar Feb 28 '18 19:02 denuno

I just want to mention that two workarounds are possible for this, for those needing a solution.

  1. There's a jekyll asciidoc template floating around that is set up to use Travis-CI to do the compilation, committing the results to the gh-pages branch of a repo.

  2. If you need to retain tighter control and compile locally, you can use git hooks to kick off scripts to do that, commit to the gh-pages branch, and push it. (I haven't tested that, but i don't see any reason it wouldn't work at first glance.)

kavaliro avatar Mar 11 '18 08:03 kavaliro

Unfortunately, publishing to GitHub Pages, or Netlify, or various other static web hosts, is a different solution entirely. If you use your own static site generator, you open up a whole world of possibilities. But the README and other files in a repository on GitHub are still crippled.

A better workaround is to have a CI job that monitors for changes to AsciiDoc files in the repository and expands the include directives, obviously leaving behind a hint so that the region can later be updated. It's not ideal, but it at least lets you keep your documents DRY.

mojavelinux avatar Mar 11 '18 08:03 mojavelinux

Yeah, the closest thing to a work-around would be some type of include that works now, like images, but that's a downer for various reasons.

I can whip up a PR for this project if there's even a whiff of interest from someone in powah, but this is a case where I won't be able to use my own fork, so I'm less than motivated to write it unless there's at least a chance of it getting in... there's been nothing on most these issues, and I'm, let us just say, not optimistic, about [email protected] being able to help.

Maybe @MikeMcQuaid can point us in the right direction?

denuno avatar Mar 11 '18 17:03 denuno

I fully understand that publishing to a static web host is a different matter and I will be sure to take a look at jekyll-asciidoc to see if it is a potential partial solution. It doesn't change the value of rendering AsciiDoc directly through GitHub (as is done with Markdown, GeoJSON, etc.).

jyutzler avatar Mar 11 '18 17:03 jyutzler

I can whip up a PR for this project if there's even a whiff of interest from someone in powah, but this is a case where I won't be able to use my own fork, so I'm less than motivated to write it unless there's at least a chance of it getting in... there's been nothing on most these issues, and I'm, let us just say, not optimistic, about [email protected] being able to help.

Maybe @MikeMcQuaid can point us in the right direction?

@denuno This isn't something I have or do work on at GitHub. Please email [email protected] rather than @mentioning me unless it relates to my open source work. Thanks!

MikeMcQuaid avatar Mar 12 '18 09:03 MikeMcQuaid

Apologies, I grabbed the first person who looked to have some type of power to do something for the project, maybe I should have @ more prolific committer-- or it seems like maybe I have no idea how contribution works for github projects on github? :thinking:

It would seem odd if, it weren't par for the course (I'm the same I reckon), that github projects don't use github, and instead have some out-of-band deal going on... seriously though? Hit up a generic 'support' addy for feedback on issues? :roll_eyes:

I sent an email. I shouldn't have bitched before trying it, but regardless-- that there is a disconnect which is worse than missing include functionality! Consider this comment a PR to address it by connecting projects to people (or at least positions beyond 'support'). :stuck_out_tongue:

denuno avatar Mar 12 '18 19:03 denuno

I'd be remiss not to do a PR now. :smiley:

Is anyone else already on it? Is there anything really to do, for that matter? This stuff just calls the external tools anyhow, not sure why any special things needed to be added in the first place?

I thought there was more to it, because of all the talk about security concerns being the main reason includes have not been implemented, but if there are security concerns with this set up, github is exposing that the infrastructure is way more vulnerable than it should be...

Like, does this really get run on some type of master server that has write access to all github repositories or some such? Could I craft an Evil Commit that would Do Something to other repositories? Is servable content + traffic not watched to verify content isn't misbehaving?

There must be a ton of stuff checking for more issues than I can think of off the top of my head... what is the real reason this hasn't been implemented in 5+ years?

If it really is a security concern, please have [email protected] contact me privately, and I'll point out the attack vectors that are exposed by stating such (which are pretty obvious, and have nothing to do with includes, so I'm hoping that idea was just someone's attempt to keep things simple -- too simple, sure -- but a worthy thing to strive for... or maybe it was some misdirection... I could get behind that too :dark_sunglasses:).

:laughing: I meant to just say I'd put my money where my mouth is, versus piling on some more, but I started really wondering mid-type.

denuno avatar Mar 12 '18 21:03 denuno

Just for some closure on this:

I dropped [email protected] a line, and someone got back to me pretty quick!

Includes are on "The List" (for next year), and there's nothing anyone outside GitHub itself can do or contribute to speed that up. It's a coderpower issue. To be on The List even a year out is like "yay!". :stuck_out_tongue_closed_eyes:

For anyone looking for authoritative feedback:

:fire: Contact [email protected] to get information about issues in the GitHub issue system.

We can then person-power the [shadow] issue systems on github by relaying responses, as I'm doing now.

Ha! Just had the awesome idea to write a GitHub support GitHub issues integration. Once you automate a shadow system, it's technically some type of mirror, right? :thinking:

denuno avatar Apr 22 '18 01:04 denuno

Hey folks 👋. I'm a PM at GitHub working with some of our Render folk and I've found this thread via https://twitter.com/matthewmccull/status/1083619220858986497

In truth, we've got some infrastructure work to pay down first before we can take a serious look at this again, but I want to let you know that it does matter to us. We're hoping to make inroads soon into that infra work – and then we can start investigating how best to address this for y'all!

cc/ @mojavelinux @jexp FYI @matthewmccullough @clarkbw @skalnik

lukehefson avatar Jan 11 '19 13:01 lukehefson

That would be amazing, thanks so much @lukehefson

Btw. I love your "small-ux-wins" (papercuts) project, that has already improved my GH experience a lot.

jexp avatar Jan 11 '19 14:01 jexp

As the lead of the Asciidoctor project, I can report that I've been asked about this more than any other feature. It would be a game changer. Even if it's not something that will happen right away, I cannot overstate how much this communication means to us at least. Thank you!

mojavelinux avatar Jan 11 '19 21:01 mojavelinux

Thank you for putting it on the docket.

cyotee avatar May 15 '19 01:05 cyotee

The GitLab issue was implemented for release 12.0 ;-)

And for what it's worth: Limiting the nesting depth to, say, 5 may not be good enough, security wise: If I want to DOS-attack it, I'd write a 10 MB file full of includes on itself... that's roundabout 600 includes. With 5 levels, that would be 600^5 = 77 trillion includes (if my math is correct :-) An alternative would be to keep a list of files already included.

t1 avatar Aug 17 '19 04:08 t1

The Asciidoctor processor already accounts for this scenario. The depth is not really a depth, but rather a stack size. In your case, it would attempt to include the file 5 times, then stop. The processor does happen to track which files have been included, so the stack is accessible if you need it. But since there are so many clever ways to increase the depth, we decided instead to use the stack size as the measurement to protect against this cleverness.

mojavelinux avatar Aug 17 '19 08:08 mojavelinux

@lukehefson This feature would be amazing to have. Now that it has been implemented by GitLab, do you think GitHub can implement it? Is the original time estimate still accurate?

Referencing code in documentation, without copying, is extremely powerful to keep examples up to date and well-tested. Even a single depth include would be a great thing to have for exactly this.

This is especially meaningful for writing specifications for large projects, where embedding well-tested python code can immensely improve the understanding for readers who would otherwise have to click through a ton of links to read relevant reference code.

protolambda avatar Sep 29 '19 01:09 protolambda

Now that it has been implemented by GitLab, do you think GitHub can implement it? Is the original time estimate still accurate?

Hey @protolambda! We're not able to make a time estimate at the moment for the same reasons I stated in https://github.com/github/markup/issues/1095#issuecomment-453515913. Although I appreciate your bump – this is still definitely something on our minds that we'd like to work towards!

lukehefson avatar Oct 02 '19 08:10 lukehefson