pandoc icon indicating copy to clipboard operation
pandoc copied to clipboard

idea: include files (and csv tables)

Open anton-k opened this issue 13 years ago • 105 comments

As far as I understand pandoc can process several files in one way only. You have to list them in the command line. There is a solution to simulate include files with scripting. It's indicated in the pandoc's official guide.

Markdown is a tiny language. We should keep it small. So here is an idea of how to simulate latex's input command without extending Markdown syntax. We can overload include image construction. If a file has an image extension, than it's treated as an image, but if it's .txt, it can be treated as Markdown:

![Show me if there is no such a file](subfile.txt)

I've come to this idea while thinking about long tables. Imagine that someone is writing a research report. There are long tables produced by an algorithm. Tables are saved in some standard format for tables, for example CSV. And then user can write

![So it goes](table.csv)

anton-k avatar Jun 29 '12 18:06 anton-k

Not sure if this would fit in Pandoc's goals as being a "universal document converter", but you can do this easily with some wrapper around Pandoc. This would of course require some technical skills from you, but you would gain much more then the above suggested features.

There are a bunch of programming tools in Pandoc extra, from which I know (and develop) pander. You could easily write a simple brew file which would compile a list of all images in a directory and create a link for those and also reading csv files and printing would be not problematic too (e.g. putting a simple read.table(foo) in a brew chunk between <%=...%> tags.

I hope you would find this useful.

daroczig avatar Jun 29 '12 20:06 daroczig

anton-k, I like the idea; had something similar in mind, when writing a technical report recently. File-extension-dependent inclusion would be a nice pandoc-extension for markdown.

Another idea related to that:

It would be great to have more general support for literate programming.

Currently I use the R-knitr-package for mixing programming languages in technical reports; as an example see https://github.com/yihui/knitr/blob/master/inst/examples/knitr-lang.Rmd.

Using pandoc directly with a file format, say lmd -- for literate markdown, would facilitate the workflow considerably.

In that sense knitr works pretty well: You could include different languages eg

``` {r test-r, engine='R'}
set.seed(123)
rnorm(5)
```

Unfortunately haskell is currently not included like eg

``` {engine='ghc'}
[x^2|x <- [1..10], x > 3 ]
```

With that in mind writing tutorials with REPLs like ghci, irb, R would be more pleasant.

michelk avatar Jul 05 '12 12:07 michelk

This could be done easily using the techniques described in the scripting documentation.

jgm avatar Nov 04 '12 19:11 jgm

I opened a seperate issure #656 for it.

michelk avatar Nov 07 '12 05:11 michelk

Hi. I'm also looking for this feature :) I found Marked.app has a nice extension:

  <<[Code title](folder/filename)

Same syntax is also supported by Leanpub system. I think some "include feature" is a must if you write large text in markdown. Now I'm using Marked.app with Custom Markdown processor configured for pandoc, so I can include files that include files and so on. Very useful if you are writing a little book with code source samples :). But is a bit tedious need printing to PDF from the Marked.app. Having this feature in pandoc will allow for command line automation :)

jcangas avatar Jun 19 '13 18:06 jcangas

@jcangas -> it looks like ThoughtBot has done this before, based on looking at the raw markdown files from their Backbone on Rails book.

@jasonm was the person who worked on the project.

thewatts avatar Jun 25 '13 04:06 thewatts

@thewatts, Thanks for the clue. It is very easy to follow the "do your self" way, of course: I have a bit of Ruby that does the magic. But I see value in it as a standard feature with a standard syntax a no need for externals tools...

jcangas avatar Jun 26 '13 12:06 jcangas

Found what they use - they have a rakefile that will take and parse the <<[Code title](folder/filename) code, and then add it into the main file.

thewatts avatar Jun 26 '13 16:06 thewatts

There is gpp in Pandoc Extras mentionned by @daroczig that can be used to include file directly (gpp is a gcc-like preprocessor) and much more. It provides a syntax to preprocess files and execute commands and the file inclusion could be achieved through #include <include ..> or even \include directives depending on the mode you select. I'm currently working on a python wrapper aiming at using gpp to preprocess special commands in a markdown file before providing it to pandoc (things like file inclusion, code inclusion, color, underline, etc). I will soon put it on github and if people are interested in such a wrapper I will add some more info about it.

dloureiro avatar Jun 26 '13 17:06 dloureiro

@thewatts I also have a rake file doing the same thing :). Well, mine is recursive also. I copy here so it can help others

# yields every line. Assume root_dir & file are Pathname objects
def merge_mdown_includes(root_dir, file, &block)
  file.each_line do |line|
    if line =~/(.*)<<\[(.*)\]$/
      incl_file = root_dir + $2
      yield $1 if block
      merge_mdown_includes(root_dir, incl_file, &block)
    else
      yield line if block
    end
  end
end

# hin about use previous routine:
merge_mdown_includes(root_dir, file) do |line|
   output_file.puts line
end

jcangas avatar Jun 26 '13 17:06 jcangas

Instead of adding another preprocessing syntax on top of Pandoc Markdown I use the following syntax to include files:

`filename.md`{.include}

one could also extend this to:

~~~ {.include}
filename.md
~~~

This way the inclusion syntax can act on the abstract syntax tree (AST) of a Pandoc document - one can get the same result from HTML like this (HTML -> Markdown -> Markdown with inclusions -> Target format):

<code class="include">filename</code>

Here is a small hack in form of a Perl script that I use by now.

while(<>) {
    if (/^`([^`]+)`\{\.include\}\s*$/) {
        if (-e $1 && open my $fh, '<', $1) {
            local $/;
            print <$fh>;
            close $fh;
        } else {
            print STDERR "failed to include file $1\n";
        }
    } else {
        print $_;
    }
}

The final implementation should work on the AST as well to allow inclusion inside other elements, for instance:

* `longlistitem.md`{.include}

nichtich avatar Jul 16 '13 19:07 nichtich

@nichtich Nice idea; converted to python and combined with Makefile:

# Makefile fragment

%.pdf : %.md
    cat $^ | ./include.py | pandoc -o $@
#!/usr/bin/env python

import re
import sys                                                                                                     
include = re.compile("`([^`]+)`\{.include}")
for line in sys.stdin:
    if include.search(line):
        input_file = include.search(line).groups()[0]
        file_contents = open(input_file, "rb").read()
        line = include.sub(line, file_contents)
    sys.stdout.write(line)

mdengler avatar Mar 02 '14 17:03 mdengler

See also this discussion on the mailing list.

mpickering avatar Dec 07 '14 17:12 mpickering

And here's my take on a Haskell filter that includes CSV's as tables: pandoc-placetable

mb21 avatar Jul 13 '15 19:07 mb21

File extension dependent overloading of the image inclusion is a great idea! Would love to see it implemented!

adius avatar Nov 09 '15 11:11 adius

I've written a basic Pandoc filter in Haskell that could include referenced Markdown files recursively, meaning the nested includes are also included. (Although only 2 levels deep, for now.) Take a look:

https://github.com/steindani/pandoc-include

To include one or multiple files use the following syntax:

```include
chapter1.md
chapter2.md
#dontinclude.md
```

steindani avatar Dec 08 '15 17:12 steindani

Hi, @mpickering, may I ask what's the status on this? Are there any branch that has work-in-progress (to see if anything to help)?

I think there are a few different categories of file extensions that can be included:

  1. those file extensions associated with pandoc readers: this allow including multiple different sources in the markdown source. e.g. ![](file.docx) would actually use the pandoc docx reader to read it into AST and include at the position.
  2. RawInline: some might not want the pandoc readers to read it though. So e.g. ![](file.tex){RawInline="true"}, ![](file.html){RawInline="true"}, will include the raw TeX and raw HTML at the position.
  3. CodeBlock: ![](file.md){CodeBlock ="true"}, ![](file.py){CodeBlock="true} would include the files as a code-block.
  4. csv: e.g. pandoc-placetable
  5. media: audio/videos files.

ickc avatar Nov 10 '16 09:11 ickc

Is this feature still under development? This would allow a complete replacement most static site generators..

HaoZeke avatar Nov 09 '17 22:11 HaoZeke

I don't think anybody is working on this. My personal opinion is that this is out of scope, as the increase in complexity seems not worth it.

A solution for CSV exists with pandoc-placetable. If one does not want to install additional binaries, pandoc 2 makes it easy achieve most of what was suggested here via lua filters. E.g., the below filter would replace an figure with its Markdown content if an image has class "markdown". This is fully portable and doesn't require extra software other than pandoc.

function Para (elem)
  if #elem.content == 1 and elem.content[1].t == "Image" then
    local img = elem.content[1]
    if img.classes[1] == "markdown" then
      local f = io.open(img.src, 'r')
      local blocks = pandoc.read(f:read('*a')).blocks
      f:close()
      return blocks
    end
  end
end

tarleb avatar Nov 09 '17 23:11 tarleb

Is this feature still under development?

Do you mean include files or table? Apparently 2 different (related) issues are mentioned here.

I think the reason why it's been taking so long is mainly not because of the difficulty/feasibility to include files, but about the question of if this should be included in pandoc, and how it should behaves (e.g. recursive?).

e.g. @jgm has an pandoc-include example in the tutorial in writing pandoc filters, and has been distributed in pandoc-include: Include other Markdown files. And there's also panflute filter doing so. So does it needed to be done in pandoc?

This would allow a complete replacement most static site generators..

Having a better template system is more important than having native pandoc-include in this aspect. I remember there's an issue about this. try searching it and see if you have any comments/suggestions there.

ickc avatar Nov 10 '17 02:11 ickc

pandoc-include is built against pandoc 1.19 , so the newer syntax is not parsed correctly.. eg. Div classes via ::::{.class} ::::

Currently my workaround is to use paru-insert.rb but it's really rather slow, pushing my build times up by 10s just to include 3 partials..

HaoZeke avatar Nov 10 '17 03:11 HaoZeke

Try filing an issue over there (or try a pull request).

Did you uses other pandoc filters? If you already have filters in python using panflute, panflute has a way to run all panflute filters in 1 pass to avoid multiple to and from JSON conversion. Often the time the reason for a slow filter is this unnecessary conversion.

You can also try the now “native” lua filter. It’s fast exactly because it avoided this issue.

You can also use some proprocessor to handle include separately. e.g. this is exactly how Multimarkdown handles “transclusion” internally. If I remember correctly, there’s a mode for the multimakrdown cli to only process the transclusion without parsing the markdown. If true, you can use multimarkdown as a preprocessor and passes it to pandoc to parse it.

I personally setup makefiles to handle stuff like this. So far it is ok. But there are times I’d want to have a template system on top of it to eases some routines. One advantage of using make is to use the -j option to build things in parallel (if you write the dependencies correctly). This would dramatically speed up a build. (I once speed up a pandoc project from 2 min+ to ~20 sec. using -j 8 on a 4-core CPU.)

ickc avatar Nov 10 '17 03:11 ickc

+++ Rohit Goswami [Nov 09 17 22:36 ]:

Is this feature still under development? This would allow a complete replacement most static site generators..

Well, we have implemented the .. csv-table directory for RST.

jgm avatar Nov 10 '17 04:11 jgm

Since he mentioned site generators, I’d imagine he’s talking about the include feature rather than CSV.

By the way, since pandoc 2.0 allow arbitrary raw blocks (of other format), probably this can be used to put a csv table inlined as a raw rst block.

ickc avatar Nov 10 '17 08:11 ickc

Well, we've also implemented includes in RST!

+++ ickc [Nov 10 17 08:22 ]:

Since he mentioned site generators, I’d imagine he’s talking about the include feature rather than CSV.

By the way, since pandoc 2.0 allow arbitrary raw blocks (of other format), probably this can be used to put a csv table inlined as a raw rst block.

— You are receiving this because you were mentioned. Reply to this email directly, [1]view it on GitHub, or [2]mute the thread.

References

  1. https://github.com/jgm/pandoc/issues/553#issuecomment-343406918
  2. https://github.com/notifications/unsubscribe-auth/AAAL5HhA1J6tT1DvQPhA-CdH2aQeICNHks5s1AfKgaJpZM4ADOQS

jgm avatar Nov 11 '17 03:11 jgm

I ended up switching to panflutes and replacing all paru filters, this worked out rather well, so the build runs in around 4-5 sec, a 50% speed increase...

HaoZeke avatar Nov 11 '17 06:11 HaoZeke

Would a minimalist include syntax using attributed spans be possible, rather than overloading the image or code block syntax? For example

[some-file.md]{.include}

Then have a lua filter (similar to one on this thread: https://groups.google.com/forum/#!topic/pandoc-discuss/FMmb1mf2lHU) to do the work?

encodis avatar Nov 14 '17 10:11 encodis

Span contents is parsed as markdown, so paths with backslash (e.g., \users\steve\example.md) or spaces (My Documents/foo.md) would be more difficult to input correctly. An inline code based method could work better, but I'm not sure if it's desirable.

`some-file.md`{.include}

tarleb avatar Nov 14 '17 12:11 tarleb

We're not the first ones thinking about this. At least the following two are already in use:

Multimarkdown File Transclusion:

{{some_other_file.txt}}

iA Writer Content Blocks proposal:

/Lorem Ipsum.txt

The corresponding talk.commonmark thread

mb21 avatar Nov 14 '17 12:11 mb21

I have another idea on the syntax to be used: to use the new raw_attribute extension, and add an optional attribute include to it.

```{=markdown include='path/to.md'}
```

``{=gfm include='path/to.md'}

The reason for this suggestion is that there are a couple of suggestions on overloading different elements (exluding custom syntax for the momment), but all of them has some flaws on its expected behavior:

  1. image link: in most output formats, it output a reference to the path of the image, not inlining the image itself.

  2. code block / inline code: it suggested what it includes is code (verbatim)

  3. div/span: it suggestes you should include the text in a div/span

(Of course the raw_attribute extension uses code block / inline code, though.)

This has the added benefits of the ability to specified the format of the included documents.

To push this idea further, the following now has an obvious meaning that coincide on what is expected in (2):

```{include='path/to.md'}
```

``{include='path/to.md'}

They includes some text, without a format, hence verbatim.

ickc avatar Nov 14 '17 12:11 ickc