text2qti icon indicating copy to clipboard operation
text2qti copied to clipboard

Add support for base64 images

Open smor opened this issue 2 years ago • 2 comments

Hello,

Thanks very much for your neat piece of software, which is really helpful to build quizzes ! I work in a team delivering online courses to 2700+ students every year. We plan on using the quizzes produced by text2qti inside D2L's BrightSpace LMS, which supports the QTI 1.2 specification.

The issue

We determined that BrightSpace has issues using file URLs on import, as it fiddles with the imported content URLs. For instance, it adds a web_resource part to the image path at render time for an unknown reason, which breaks the link to the image.

We would like our quizzes to be as portable as possible, so that they can be imported anywhere without a messy handling of (possibly duplicated) files, permissions, 404s, etc. Our expected use case is to drop a self-contained ZIP file containing a bunch of XML files and no external file.

The proposal

One solution is to use base64 images instead of file URLs, as this removes the need to store external files and maintain functional links to those : everything is included in the XML file. We asserted that BrightSpace can handle base64 images properly.

How to do it

In this comment https://github.com/gpoore/text2qti/issues/42#issuecomment-802894172 @gpoore says that "it would probably be easy to add an option that converts all images into inline base64 encoded img". That's great news, and I'm willing to tackle this issue !

I tried naively to pass --self-contained to the Pandoc commands in cmdline.py, but it didn't work. 🤷‍♂️

I'm a competent though rusty Python developper, and I would be very thankful for any guidance to get me started. Here is the information I gathered so far:

  • Questions in text2qti are_ processed through Pandoc in export.py https://github.com/gpoore/text2qti/issues/52#issuecomment-1116358284
  • Pandoc can export data: URIs using the --self-contained command-line option
  • My guess is that the image conversion happens around here ? https://github.com/gpoore/text2qti/blob/de0c805df5b9081db0a2e0fb5c02a427fc1a8f25/text2qti/markdown.py#L57

I would probably add a --pandoc-self-contained boolean command-line option which would make is so that :

  1. images' src attributes are replaced by data: URIs ;
  2. files are not copied in the ZIP file.

What do you think ?

smor avatar Jun 14 '22 09:06 smor

Everything related to Pandoc is for exporting solutions in PDF or HTML or other formats, so initially you can ignore that when adding new features. Once you have implemented a new feature, you need to make sure that it works with Pandoc export as one of the final steps. In this case, I don't expect that any Pandoc changes will be needed.

You will want to focus on markdown.py. This is where Python-Markdown is used to create the HTML that is included in the quiz. Currently, Text2qtiImagePattern is used to customize the way that images are processed, basically by rewriting the image location from what is in the Markdown source to where it will be located in the quiz file. You would want to create a new class that replaces a file-based image with a base64 image. My guess is that you will need a handleMatch() that gets the image node provided by default Python-Markdown, but then returns some sort of raw HTML node that contains <img ...>. You would have to go through the Python-Markdown documentation to sort out exactly what type of node you want to return.

I would suggest a new command-line option like --image-base64. It may also be worth thinking about an option image-base: <bool> that can be set within quiz files.

gpoore avatar Jun 19 '22 21:06 gpoore

Hello, Thanks for the pointers. I did start to implement this, and hope to propose a pull request somewhere next week. I focus on the CLI toggle for now, as I'm still unfamiliar with the frontmatter options.

Best regards

smor avatar Jun 28 '22 15:06 smor