pandoc
pandoc copied to clipboard
[Doc] Documenting the differences between piping and `--filter`
MWE:
Untitled.ipynb.zip inlined below
# Untitled.ipynb
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "16914f50-c579-44a2-a9c7-cc7e871d5b0b",
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "d071ccbf-542c-43f6-b341-27e745830160",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[<matplotlib.lines.Line2D at 0x11e7ca0e0>]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAD4CAYAAADlwTGnAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAAWRUlEQVR4nO3dbYxc1YHm8f+zftEQSORmaLwe28FMZDFYTMb2thzvWooYoZmAdxUTNEggjfEiI2clw8IuOyviL+TbWHkhG6QIyxm8sTWEiAl2sLLeIchBspACpLEbjGlYHCBg02P3BAWjBW1i8uyHuk5qKlVdVV3V3W7O85NKde95uX1OtVVP31u3fGSbiIgoz7+a6QFERMTMSABERBQqARARUagEQEREoRIAERGFmjvTA+jGJZdc4mXLls30MCIiZpXnnnvun20PNpbPqgBYtmwZw8PDMz2MiIhZRdLPm5XnElBERKESABERhUoAREQUKgEQEVGoBEBERKHaBoCkpZKelDQq6ZikO5u0+RNJP5H0/yT9t4a6ayW9Ium4pHvqyi+W9ISkV6vngf5MKSIiOtHJGcBZ4G7bVwJrga2SVjS0eQf4z8DX6gslzQG+BVwHrABurut7D3DQ9nLgYLUfERHTpG0A2B6zfbjafg8YBRY3tDlt+6fArxu6rwGO237N9q+A7wEbqroNwO5qezdw/WQnERER3evqMwBJy4BVwDMddlkMvFW3f4LfhcdC22NQCxng0hY/c4ukYUnD4+Pj3Qw3IiIm0HEASLoIeBS4y/aZTrs1KetqBRrbO20P2R4aHPy9bzJHRMQkdRQAkuZRe/N/yPbeLo5/Alhat78EeLvaPiVpUXX8RcDpLo4bERE96uQuIAEPAqO27+vy+D8Flku6XNJ84CZgf1W3H9hUbW8CHuvy2BER0YNO/jO4dcBG4KikkapsG/BJANs7JP1rYBj4BPAbSXcBK2yfkXQ78DgwB9hl+1h1jO3AI5I2A28CN/ZnShER0Ym2AWD7KZpfy69v80/ULu80qzsAHGhS/gvgms6GGRER/ZZvAkdEFCoBEBFRqARAREShEgAREYVKAEREFCoBEBFRqARAREShEgAREYVKAEREFCoBEBFRqARAREShEgAREYVKAEREFCoBEBFRqARAREShEgAREYXqZEnIpZKelDQq6ZikO5u0kaT7JR2X9IKk1VX5FZJG6h5nqtXCkPRlSSfr6tb3fXYREdFSJ0tCngXutn1Y0seB5yQ9YfulujbXAcurx2eAB4DP2H4FWAkgaQ5wEthX1+8btr/W+zQiIqJbbc8AbI/ZPlxtvweMAosbmm0A9rjmaWCBpEUNba4Bfmb7530Yd0RE9KirzwAkLQNWAc80VC0G3qrbP8Hvh8RNwMMNZbdXl4x2SRpo8TO3SBqWNDw+Pt7NcCMiYgIdB4Cki4BHgbtsn2msbtLFdX3nA58H/qGu/gHgU9QuEY0BX2/2c23vtD1ke2hwcLDT4UZERBsdBYCkedTe/B+yvbdJkxPA0rr9JcDbdfvXAYdtnzpXYPuU7Q9t/wb4NrCm28FHRMTkdXIXkIAHgVHb97Voth+4pbobaC3wru2xuvqbabj80/AZwReAF7saeURE9KSTu4DWARuBo5JGqrJtwCcBbO8ADgDrgePA+8Ct5zpL+hjwF8AXG477FUkrqV0qeqNJfURETKG2AWD7KZpf469vY2Bri7r3gT9sUr6xwzFGRMQUyDeBIyIKlQCIiChUAiAiolAJgIiIQiUAIiIKlQCIiChUAiAiolAJgIiIQiUAIiIKlQCIiChUAiAiolAJgIiIQiUAIiIKlQCIiChUAiAiolAJgIiIQnWyJORSSU9KGpV0TNKdTdpI0v2Sjkt6QdLquro3JB2VNCJpuK78YklPSHq1eh7o37QiIqKdTs4AzgJ3274SWAtslbSioc11wPLqsQV4oKH+z22vtD1UV3YPcND2cuBgtR8REdOkbQDYHrN9uNp+DxgFFjc02wDscc3TwIKGRd+b2QDsrrZ3A9d3M/CIiOhNV58BSFoGrAKeaahaDLxVt3+C34WEgR9Jek7Slro2C22PQS1kgEtb/MwtkoYlDY+Pj3cz3IiImEDHASDpIuBR4C7bZxqrm3Rx9bzO9mpql4m2SvpsNwO0vdP2kO2hwcHBbrpGRMQEOgoASfOovfk/ZHtvkyYngKV1+0uAtwFsn3s+DewD1lRtTp27TFQ9n57MBCIiYnI6uQtIwIPAqO37WjTbD9xS3Q20FnjX9pikCyV9vDrOhcBfAi/W9dlUbW8CHuthHhER0aW5HbRZB2wEjkoaqcq2AZ8EsL0DOACsB44D7wO3Vu0WAvtqGcJc4Lu2/7Gq2w48Imkz8CZwY6+TiYiIzrUNANtP0fwaf30bA1ublL8G/FmLPr8ArulsmBER0W/5JnBERKESABERhUoAREQUKgEQEVGoBEBERKESABERhUoAREQUKgEQEVGoBEBERKESABERhUoAREQUKgEQEVGoBEBERKESABERhUoAREQUqpMVwZZKelLSqKRjku5s0kaS7pd0XNILkla36yvpy5JOShqpHuv7O7WIiJhIJyuCnQXutn24Wt7xOUlP2H6prs11wPLq8Rnggeq5Xd9v2P5a32YTEREda3sGYHvM9uFq+z1gFFjc0GwDsMc1TwMLJC3qsG9ERMyArj4DkLQMWAU801C1GHirbv8EDW/0LfreXl0y2iVpoMXP3CJpWNLw+Ph4N8ONiIgJdBwAki4CHgXusn2msbpJF7fp+wDwKWAlMAZ8vdnPtb3T9pDtocHBwU6HGxERbXQUAJLmUXsDf8j23iZNTgBL6/aXAG9P1Nf2Kdsf2v4N8G1gzeSmEBERk9HJXUACHgRGbd/Xotl+4JbqbqC1wLu2xybqK2lR3e4XgBcnNYOIiJiUTu4CWgdsBI5KGqnKtgGfBLC9AzgArAeOA+8Dt07U1/YB4CuSVlK7VPQG8MXephIREd1oGwC2n6L5Nf76Nga2dtPX9sYOxxgREVMg3wSOiChUAiAiolAJgIiIQiUAIiIKlQCIiChUAiAiolAJgIiIQiUAIiIKlQCIiChUAiAiolAJgIiIQiUAIiIKlQCIiChUAiAiolAJgIiIQiUAIiIK1cmSkEslPSlpVNIxSXc2aSNJ90s6LukFSavr6q6V9EpVd09d+cWSnpD0avU80L9pRUyPHxw5ybrtP+bye/4X67b/mB8cOTnTQ4roWCdnAGeBu21fCawFtkpa0dDmOmB59dgCPAAgaQ7wrap+BXBzXd97gIO2lwMHq/2IWeMHR07ypb1HOfnLDzBw8pcf8KW9RxMCMWu0DQDbY7YPV9vvAaPA4oZmG4A9rnkaWFAt+r4GOG77Ndu/Ar5XtT3XZ3e1vRu4vtfJREynrz7+Ch/8+sN/UfbBrz/kq4+/MkMjiuhOV58BSFoGrAKeaahaDLxVt3+iKmtVDrDQ9hjUQga4tMXP3CJpWNLw+Ph4N8ONmFJv//KDrsojzjcdB4Cki4BHgbtsn2msbtLFE5R3zPZO20O2hwYHB7vpGjGl/mjBBV2VR5xvOgoASfOovfk/ZHtvkyYngKV1+0uAtycoBzhVXSaiej7d3dAjZtbffO4KLpg351+UXTBvDn/zuStmaEQR3enkLiABDwKjtu9r0Ww/cEt1N9Ba4N3qss5PgeWSLpc0H7ipanuuz6ZqexPwWA/ziJh2169azN/e8KcsXnABAhYvuIC/veFPuX5V40dkEeenuR20WQdsBI5KGqnKtgGfBLC9AzgArAeOA+8Dt1Z1ZyXdDjwOzAF22T5WHWM78IikzcCbwI39mFDEdLp+1eK84ces1TYAbD9F82v59W0MbG1Rd4BaQDSW/wK4prNhRkREv+WbwBERhUoAREQUKgEQEVGoBEBERKESABERhUoAREQUKgEQEVGoBEBERKESABERhUoAREQUKgEQEVGoBEBERKESABERhUoAREQUKgEQEVGoBEBERKE6WRJyl6TTkl5sUT8gaZ+kFyQ9K+mqqvwKSSN1jzOS7qrqvizpZF3d+r7OKiIi2urkDOA7wLUT1G8DRmx/GrgF+CaA7Vdsr7S9Evg31JaK3FfX7xvn6qtVwyIiYhq1DQDbh4B3JmiyAjhYtX0ZWCZpYUOba4Cf2f75ZAcaERH91Y/PAJ4HbgCQtAa4DFjS0OYm4OGGstury0a7JA20OrikLZKGJQ2Pj4/3YbgREQH9CYDtwICkEeAO4Ahw9lylpPnA54F/qOvzAPApYCUwBny91cFt77Q9ZHtocHCwD8ONiAiAub0ewPYZ4FYASQJerx7nXAcctn2qrs9vtyV9G/hhr+OIiIju9HwGIGlB9Vc+wG3AoSoUzrmZhss/khbV7X4BaHqHUURETJ22ZwCSHgauBi6RdAK4F5gHYHsHcCWwR9KHwEvA5rq+HwP+Avhiw2G/ImklYOCNJvURETHF2gaA7Zvb1P8EWN6i7n3gD5uUb+x0gBERMTXyTeCIiEIlACIiCpUAiIgoVAIgIqJQCYCIiEIlACIiCpUAiIgoVAIgIqJQCYCIiEIlACIiCpUAiIgoVAIgIqJQCYCIiEIlACIiCpUAiIgoVNsAqBZtPy2p6apdkgYk7asWeH9W0lV1dW9IOippRNJwXfnFkp6Q9Gr13HJR+IiImBqdnAF8B7h2gvptwIjtTwO3AN9sqP9z2yttD9WV3QMctL0cOFjtR0TENGobALYPAe9M0GQFtTdxbL8MLJO0sM1hNwC7q+3dwPVtRxoREX3Vj88AngduAJC0BrgMWFLVGfiRpOckbanrs9D2GED1fGmrg0vaImlY0vD4+HgfhhsREdCfANgODEgaAe4AjgBnq7p1tlcD1wFbJX2224Pb3ml7yPbQ4OBgH4YbERHQwaLw7dg+A9wKIEnA69UD229Xz6cl7QPWAIeAU5IW2R6TtAg43es4IiKiOz2fAUhaIGl+tXsbcMj2GUkXSvp41eZC4C+Bc3cS7Qc2VdubgMd6HUdERHSn7RmApIeBq4FLJJ0A7gXmAdjeAVwJ7JH0IfASsLnquhDYVzspYC7wXdv/WNVtBx6RtBl4E7ixXxOKiIjOtA0A2ze3qf8JsLxJ+WvAn7Xo8wvgmg7HGBERUyDfBI6IKFQCICKiUAmAiIhCJQAiIgqVAIiIKFQCICKiUAmAiIhCJQAiIgqVAIiIKFQCICKiUAmAiIhCJQAiIgqVAIiIKFQCICKiUAmAiIhCJQAiIgrVNgAk7ZJ0WtKLLeoHJO2T9IKkZyVdVZUvlfSkpFFJxyTdWdfny5JOShqpHuv7N6WIiOhEJ2cA3wGunaB+GzBi+9PALcA3q/KzwN22rwTWAlslrajr9w3bK6vHge6HHhERvWgbALYPAe9M0GQFcLBq+zKwTNJC22O2D1fl7wGjwOLehxwREf3Qj88AngduAJC0BrgMWFLfQNIyYBXwTF3x7dVlo12SBlodXNIWScOShsfHx/sw3IiIgP4EwHZgQNIIcAdwhNrlHwAkXQQ8Ctxl+0xV/ADwKWAlMAZ8vdXBbe+0PWR7aHBwsA/DjYgIgLm9HqB6U78VQJKA16sHkuZRe/N/yPbeuj6nzm1L+jbww17HERER3en5DEDSAknzq93bgEO2z1Rh8CAwavu+hj6L6na/ADS9wygiIqZO2zMASQ8DVwOXSDoB3AvMA7C9A7gS2CPpQ+AlYHPVdR2wEThaXR4C2Fbd8fMVSSsBA28AX+zPdCIiolNtA8D2zW3qfwIsb1L+FKAWfTZ2OsCIiJga+SZwREShEgAREYVKAEREFCoBEBFRqARAREShEgAREYVKAEREFCoBEBFRqARAREShEgAREYVKAEREFCoBEBFRqARAREShEgAREYVKAEREFCoBEBFRqLYBIGmXpNOSmi7bKGlA0j5JL0h6VtJVdXXXSnpF0nFJ99SVXyzpCUmvVs8D/ZlORER0qpMzgO8A105Qvw0Ysf1p4BbgmwCS5gDfAq4DVgA3S1pR9bkHOGh7OXCw2o+IiGnUNgBsHwLemaDJCmpv4th+GVgmaSGwBjhu+zXbvwK+B2yo+mwAdlfbu4HrJzX6iIiYtH58BvA8cAOApDXAZcASYDHwVl27E1UZwELbYwDV86WtDi5pi6RhScPj4+N9GG5EREB/AmA7MCBpBLgDOAKcpfmC8O724LZ32h6yPTQ4ONjTQCMi4nfm9noA22eAWwEkCXi9enwMWFrXdAnwdrV9StIi22OSFgGnex1HRER0p+czAEkLJM2vdm8DDlWh8FNguaTLq/qbgP1Vu/3Apmp7E/BYr+OIiIjutD0DkPQwcDVwiaQTwL3APADbO4ArgT2SPgReAjZXdWcl3Q48DswBdtk+Vh12O/CIpM3Am8CN/ZxURES0J7vry/IzZmhoyMPDwzM9jIiIWUXSc7aHGsvzTeCIiEIlACIiCpUAiIgoVAIgIqJQs+pDYEnjwM9nehyTcAnwzzM9iGlU2nwhcy7FbJ3zZbZ/75u0syoAZitJw80+gf+oKm2+kDmX4qM251wCiogoVAIgIqJQCYDpsXOmBzDNSpsvZM6l+EjNOZ8BREQUKmcAERGFSgBERBQqAdCDVove19UPSNon6QVJz0q6qq5ugaTvS3pZ0qikfzu9o5+cHuf8XyQdk/SipIcl/cH0jr57knZJOi3pxRb1knR/9Xq8IGl1Xd2Er9X5arJzlrRU0pPVv+djku6c3pFPXi+/56p+jqQjkn44PSPuE9t5TOJB7b+4/hnwx8B8aktjrmho81Xg3mr7T4CDdXW7gduq7fnAgpme01TOmdpyoK8DF1T7jwD/cabn1MGcPwusBl5sUb8e+N/UVsBbCzzT6Wt1vj56mPMiYHW1/XHg/3zU51xX/1+B7wI/nOm5dPPIGcDkTbTo/TkrgIMAtl8GlklaKOkT1P7BPVjV/cr2L6dt5JM36TlXdXOBCyTNpbZi3Nuc52wfAt6ZoMkGYI9rngYWVKvcdfJanZcmO2fbY7YPV8d4Dxjld+uAn9d6+D0jaQnw74G/m/qR9lcCYPImWvT+nOeBGwAkrQEuo7Y05h8D48D/rE4b/07ShVM/5J5Nes62TwJfo7YA0Bjwru0fTfmIp16r16ST12q2ajs3ScuAVcAz0zesKTXRnP8H8N+B30zzmHqWAJi8Tha93w4MSBoB7gCOAGep/SW8GnjA9irg/wKz4RrxpOcsaYDaX1GXA38EXCjpr6dwrNOl1WvSyWs1W004N0kXAY8Cd7m2POxHQdM5S/oPwGnbz033gPqh50XhC3aC1oveA1D9478Vah8iUbsG/jq1yx8nbJ/76+j7zI4A6GXOnwNetz1e1e0F/h3w91M/7CnV6jWZ36L8o6DlvwNJ86i9+T9ke+8MjG2qtJrzXwGfl7Qe+APgE5L+3vas+OMmZwCTN9Gi98Bv7/SZX+3eBhyyfcb2PwFvSbqiqruG2nrK57tJz5napZ+1kj5WBcM11K4Rz3b7gVuqu0TWUru0NUYHr9Us1nTO1e/1QWDU9n0zO8S+azpn21+yvcT2Mmq/4x/Pljd/yBnApLnFoveS/lNVvwO4Etgj6UNqb/Cb6w5xB/BQ9ebwGtVfzeezXuZs+xlJ3wcOU7sMdoRZ8LV6SQ8DVwOXSDoB3AvMg9/O9wC1O0SOA+9T/R5bvVbTPoFJmOycgXXARuBodQkQYJvtA9M2+EnqYc6zWv4riIiIQuUSUEREoRIAERGFSgBERBQqARARUagEQEREoRIAERGFSgBERBTq/wNp1Bb15w63swAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.plot([1], [2], \"o\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "all310-conda-forge",
"language": "python",
"name": "all310-conda-forge"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Running
pandoc -s -o Untitled.html Untitled.ipynb --embed-resources
has the expected behavior. However,
pandoc -s -o Untitled.native Untitled.ipynb --embed-resources -t native
pandoc -s -o Untitled.native Untitled.ipynb --embed-resources -t native --ipynb-output=all
will resulted in the data stripped out:
Pandoc
Meta
{ unMeta =
fromList
[ ( "jupyter"
, MetaMap
(fromList
[ ( "kernelspec"
, MetaMap
(fromList
[ ( "display_name"
, MetaString "all310-conda-forge"
)
, ( "language" , MetaString "python" )
, ( "name"
, MetaString "all310-conda-forge"
)
])
)
, ( "language_info"
, MetaMap
(fromList
[ ( "codemirror_mode"
, MetaMap
(fromList
[ ( "name" , MetaString "ipython" )
, ( "version" , MetaString "3" )
])
)
, ( "file_extension" , MetaString ".py" )
, ( "mimetype"
, MetaString "text/x-python"
)
, ( "name" , MetaString "python" )
, ( "nbconvert_exporter"
, MetaString "python"
)
, ( "pygments_lexer"
, MetaString "ipython3"
)
, ( "version" , MetaString "3.10.4" )
])
)
, ( "nbformat" , MetaString "4" )
, ( "nbformat_minor" , MetaString "5" )
])
)
]
}
[ Div
( "16914f50-c579-44a2-a9c7-cc7e871d5b0b"
, [ "cell" , "code" ]
, [ ( "execution_count" , "1" ) ]
)
[ CodeBlock
( "" , [ "python" ] , [] ) "import matplotlib.pyplot as plt"
]
, Div
( "d071ccbf-542c-43f6-b341-27e745830160"
, [ "cell" , "code" ]
, [ ( "execution_count" , "2" ) ]
)
[ CodeBlock
( "" , [ "python" ] , [] ) "plt.plot([1], [2], \"o\")"
, Div
( ""
, [ "output" , "execute_result" ]
, [ ( "execution_count" , "2" ) ]
)
[ CodeBlock
( "" , [] , [] )
"[<matplotlib.lines.Line2D at 0x11e7ca0e0>]"
]
, Div
( "" , [ "output" , "display_data" ] , [] )
[ Para
[ Image
( "" , [] , [] )
[]
( "c596dac3521e8bee0a57fad6727a0a6f8635f1b6.png"
, ""
)
]
, CodeBlock
( "" , [] , [] ) "<Figure size 432x288 with 1 Axes>"
]
]
]
Is there a way to fix this, if not, what would you suggest to workflow that have to separate the to and from in 2 processes? i.e.
pandoc -f X -t Y -F F ...
pandoc -f X -t json ... | F | pandoc -f json -t Y ...
often has subtle differences.
The rationale of a workflow like this is that I'm developing a static site generator powered by pandoc. The only relevant part here is that it is processing more than 1 files (with more than 1 extensions that the pipeline would dispatch over.)
So in a M-to-N files generations, it is more efficient to start 1 single Python process (I image the situation is similar in other languages.)
So in this case pandoc -f X -t Y -F F ...-style only starts N pandoc instances, it starts N Python instances per pandoc filter.
But pandoc -f X -t json ... | F | pandoc -f json -t Y ...-style starts (M+N) pandoc instances, but only 1 Python instance. (And the generation of intermediate AST can be reused for different outputs in case of M < N.)
So to recap, the questions are
- For this specific issue, would there be a fix?
- in general, is using
pandoc -f X -t json ... | F | pandoc -f json -t Y ...-style workflow working against the design philosophy of pandoc as a command line program? Is there any recommendations to workflow like this? Or would this be discouraged?
If you intend to improve support of such scenario, and if there are other subtleties, we can open a separate issue tracking them.
Thanks.
I'm re-categorizing this as a feature request, since the documentation states that --embed-resources only works for HTML output.
Another format where this could be helpful is JATS output.
Full resource embedding has to operate on the generated HTML, because it needs to include resources supplied by the template, parse associated CSS for includes, ad so on. This is all very HTML-specific, hence it only works for HTML at the moment.
Would it solve your problem to use the media bag functions in a Lua filter? https://pandoc.org/lua-filters.html#module-pandoc.mediabag
In this specific case, with the existing tool I can do
pandoc -s -o Untitled.native Untitled.ipynb --embed-resources -t native --extract-media=temp
pandoc -s -o Untitled.html Untitled.native --embed-resources -f native
where the only difference between this and
pandoc -s -o Untitled.html Untitled.ipynb --embed-resources
is an extra-write to dump the media.
I guess the pandoc.mediabag can be used to hold it in memory? I'm guessing it is very pandoc-lua-filter-specific and would be impossible to implement in something like panflute?
And the broader question I'd have is again how compatible are the following 2 commands:
pandoc -f X -t Y -F F ...
pandoc -f X -t json ... | F | pandoc -f json -t Y ...
Is embeded-media the only exception? Or are there other cases?
May be we should reframe this issue something like "documenting the differences between piping and letting pandoc calling a filter"?
Access to the mediabag is only possible from Lua filters, not JSON filters.
Changing the title as "[Doc] Documenting the differences between piping and --filter", perhaps under the heading https://pandoc.org/MANUAL.html#option--filter.
Another example of subtle differences between the 2 ways to apply filter:
In ipynb to html/latex conversion, breaking it into 2 pipes would require the 1st pipe to use --ipynb-output=all.