cocalc
cocalc copied to clipboard
extract images from jupyter markdown attachments to avoid bloat and sync issues
Right now, an image "attached" to a markdown cell appears directly in the patches. This could lead to very large patches, causing crashes, etc. Here is a small excerpt from a patch stored in the DB to illustrate this:
{"type":"cell",
"id":"98da53",
"pos":478,
"input":"",
"cell_type":"markdown",
"attachments":{
"image.png":{
"type":"base64",
"value":"iVBORw0KGgoAAAANSUhEUgAAA7QAAAH/... [a lot of data] ...
...
The goal of this ticket is to extract the "value" from the attachments – or something equivalent – similar to how images in output cells are also already extracted.
well, this is more general than just attachments. Right now I'm debugging a case, where an "output" is getting too large. i.e.
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": "\n<iframe srcdoc=\"<!DOCTYPE html>\n<html>\n<head>\n<title></title>\n<meta char
set="utf-8">\n<meta name=viewport content="width=device-width, user-scalable=no, min
imum-scale=1.0, maximum-scale=1.0">\n<style>\n\n body { margin: 0px; overflow: hidden; }\n\
n #menu-container { position: absolute; bottom: 30px; right: 40px; cursor: default; }\n\n #me
nu-message { position: absolute; bottom: 0px; right: 0px; white-space: nowrap;\n
display: none; background-color: #F5F5F5; padding: 10px; }\n\n #menu-content { position: absolu
te; bottom: 0px; right: 0px;\n display: none; background-color: #F5F5F5; border-....
total file size about 40mb