polymath Google Docs and Google Slides importer

Ideally it would be possible to enumerate some Google Docs and Google Slides you own and have it import the content.

I'd love to have for example https://komoroske.com/gardening-platforms and https://komoroske.com/slime-mold in it.

For slides, it can just select any text runs and also speaker notes.

For docs it should be straightforward.

Jan 10 '23 02:01 jkomoros

I asked GPT what the code should be:

function extractTextFromSlides() {
  var presentation = SlidesApp.getActivePresentation();
  var slides = presentation.getSlides();
  var text = "";

  for (var i = 0; i < slides.length; i++) {
    var slide = slides[i];
    var elements = slide.getPageElements();

    for (var j = 0; j < elements.length; j++) {
      var element = elements[j];

      if (element.getPageElementType() == SlidesApp.PageElementType.SHAPE) {
        var shape = element.asShape();
        var shapeText = shape.getText();

        if (shapeText) {
          var paragraphs = shapeText.getParagraphs();

          for (var k = 0; k < paragraphs.length; k++) {
            var paragraph = paragraphs[k];
            var runs = paragraph.getRuns();

            for (var l = 0; l < runs.length; l++) {
              var run = runs[l];
              text += run.getText();
            }
          }
        }
      } else if (element.getPageElementType() == SlidesApp.PageElementType.TABLE) {
        var table = element.asTable();
        var rows = table.getRows();

        for (var k = 0; k < rows.length; k++) {
          var row = rows[k];
          var cells = row.getCells();

          for (var l = 0; l < cells.length; l++) {
            var cell = cells[l];
            var cellText = cell.getText();

            if (cellText) {
              var paragraphs = cellText.getParagraphs();

              for (var m = 0; m < paragraphs.length; m++) {
                var paragraph = paragraphs[m];
                var runs = paragraph.getRuns();

                for (var n = 0; n < runs.length; n++) {
                  var run = runs[n];
                  text += run.getText();
                }
              }
            }
          }
        }
      }
    }
    var speakerNotes = slide.getNotesPage().getSpeakerNotesShape().getText();
    if(speakerNotes)
      text += speakerNotes.getText()
  }
  Logger.log(text);
}

Jan 11 '23 01:01 jkomoros

Ain't bad.

Jan 11 '23 01:01 dglazkov

I wonder if this might be a good approach: https://developers.google.com/docs/api/samples/extract-text#python

Jan 24 '23 04:01 dglazkov

[x] Document the importer in README
[ ] Add a Google Slides importer
[ ] Support importing all files within a Drive folder
[ ] Add an ability to only output docs that are viewable to anyone with a link, vs require an ACL. (Similar to the Medium importer's --medium-include={all, draft, published}) In the future maybe allow specifying precisely which userID must have an ACL to be output?
[ ] Chunk doc content based on headings. Use the deep-link URL for each heading, so deep links more effectively link to the content being used
[ ] Is there a way to accept developer information from CLI to create credentials.SECRET.json on first run?
[ ] Move crenentials/tokens.SECRET.json to a name specific to google

Jan 25 '23 13:01 jkomoros

polymath polymath copied to clipboard

Google Docs and Google Slides importer

polymath
polymath copied to clipboard