gmail-processor icon indicating copy to clipboard operation
gmail-processor copied to clipboard

Not working anymore...

Open chaoscreater opened this issue 5 years ago • 8 comments

Not sure if this is due to a recent Google Script API change or something, but I'm facing this issue where the script is saving attachments from already processed emails into my Google Drive.

For example, here's an email that has just come through. I should only expect 1 new attachment in my Google Drive:

image

However, I'm getting multiple attachments:

image

I haven't touched the script ever since I set it up and got it working. I've updated the script to match the latest one on your Github, but I'm still facing the same issue. My config is all good, labels are being applied correctly.

The thing is, if all emails are processed and have the "to-gdrive/processed" label assigned, then the script won't create new attachments. So it seems that only when new emails arrive, it will go through all previously processed emails. But when no new emails arrive, all previously processed emails will be ignored (which is correct, as this is the correct behaviour).

chaoscreater avatar Jul 15 '19 07:07 chaoscreater

Yesterday's trigger worked as expected, but today I am also getting duplicate attachments saved to Drive.

On Mon, Jul 15, 2019 at 2:23 AM chaoscreater [email protected] wrote:

Not sure if this is due to a recent Google Script API change or something, but I'm facing this issue where the script is saving attachments from already processed emails into my Google Drive.

For example, here's an email that has just come through. I should only expect 1 new attachment in my Google Drive:

[image: image] https://user-images.githubusercontent.com/18227319/61200481-a97a9880-a735-11e9-9722-b5f7464efd85.png

However, I'm getting multiple attachments:

[image: image] https://user-images.githubusercontent.com/18227319/61200490-b4352d80-a735-11e9-93e2-4f0a2402f607.png

I haven't touched the script ever since I set it up and got it working. I've updated the script to match the latest one on your Github, but I'm still facing the same issue. My config is all good, labels are being applied correctly.

The thing is, if all emails are processed and have the "to-gdrive/processed" label assigned, then the script won't create new attachments. So it seems that only when new emails arrive, it will go through all previously processed emails. But when no new emails arrive, all previously processed emails will be ignored (which is correct, as this is the correct behaviour).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ahochsteger/gmail2gdrive/issues/31?email_source=notifications&email_token=AAIMSEFKUVOFTFZWYAPBTW3P7QQX7A5CNFSM4IDT4NKKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G7EYD2A, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIMSEDL3CZUOIZCBGG2XETP7QQX7ANCNFSM4IDT4NKA .

zrichmond avatar Jul 15 '19 13:07 zrichmond

Still broken for me. I've tested some other Gmail to Google Drive sync scripts I found online and they all seem to be producing the same result.

Not sure if it's a bronken/changed API or something, I'm not technical enough to troubleshoot this...

chaoscreater avatar Jul 16 '19 22:07 chaoscreater

Anyone able to fix this?

Update: Looks like the issue is due to the multiple processed emails having the same subject, so it treats it as a single email and copies the attachment from all of them into Google Drive.

Need to make sure the processed emails are in different emails. But this can't really be done in certain situations, especially if you're using e.g. an IP cam that is hardcoded to use a specific naming convention for sending out emails. Unfortunately, my IP cam sends out all emails with the same subject.

This must be a recent change by Google though, as it was never a problem for me in the past...

chaoscreater avatar Jul 24 '19 22:07 chaoscreater

Not sure if this is due to a recent Google Script API change or something, but I'm facing this issue where the script is saving attachments from already processed emails into my Google Drive.

For example, here's an email that has just come through. I should only expect 1 new attachment in my Google Drive:

image

However, I'm getting multiple attachments:

image

I haven't touched the script ever since I set it up and got it working. I've updated the script to match the latest one on your Github, but I'm still facing the same issue. My config is all good, labels are being applied correctly.

The thing is, if all emails are processed and have the "to-gdrive/processed" label assigned, then the script won't create new attachments. So it seems that only when new emails arrive, it will go through all previously processed emails. But when no new emails arrive, all previously processed emails will be ignored (which is correct, as this is the correct behaviour).

I also started getting the same problem, however, my file and subject names are not the same. The function works as expected apart from the duplicated attachments... Is there a quick fix to this?

msyed16 avatar Aug 08 '19 14:08 msyed16

Not sure if this is due to a recent Google Script API change or something, but I'm facing this issue where the script is saving attachments from already processed emails into my Google Drive. For example, here's an email that has just come through. I should only expect 1 new attachment in my Google Drive: image However, I'm getting multiple attachments: image I haven't touched the script ever since I set it up and got it working. I've updated the script to match the latest one on your Github, but I'm still facing the same issue. My config is all good, labels are being applied correctly. The thing is, if all emails are processed and have the "to-gdrive/processed" label assigned, then the script won't create new attachments. So it seems that only when new emails arrive, it will go through all previously processed emails. But when no new emails arrive, all previously processed emails will be ignored (which is correct, as this is the correct behaviour).

I also started getting the same problem, however, my file and subject names are not the same. The function works as expected apart from the duplicated attachments... Is there a quick fix to this?

See here: https://github.com/ahochsteger/gmail2gdrive/issues/19

chaoscreater avatar Aug 09 '19 01:08 chaoscreater

@chaoscreater, @zrichmond, @msyed16 do you still see this behavior? If not, how could you solve it? If you still observe that behavior, can you check, if the related emails are grouped into the same thread by GMail? Maybe put post the config and log output here helps to resolve the issue. FIY: I'm about to consolidate open issues in preparation for v2 which should provide more flexibility and stability (due to automated testing).

ahochsteger avatar Jun 18 '22 10:06 ahochsteger

@ahochsteger

I've stopped using the script about 2 years ago and don't really need it anymore. I haven't looked at your new script yet, but I'm pretty sure the issue is still there in the old script. I believe I got around this issue by modifying the script (with the help of Stackoverflow).

Just to recap, the issue is specific to emails that are in the same conversational thread - i.e emails that have the same subject. The solution is to break the threaded email into multiple individual emails, each with an unique email subject. Whenever the script processes each thread, it will use the current timestamp (with the precise second and milliseconds) and apply that to the email subject, thus making it unique.

The full logic goes something like this:

  1. Script looks for single threaded conversational emails (based on Gmail label 'gmail2drive-wyzecam').
  2. Script will break each thread in the email to new individual emails, each with a unique email subject based on the current timestamp.
  3. When these individual emails are created, Gmail will automatically label them with the same label. You'll need to create a Gmail filter to do this of course. They new individual emails all have the same prefix in the subject, which is "[tinyCam] motion detected......" in my case.
  4. The rest is the same as your script, it'll just process each email and extract the attachment to Google Drive.

In the modified script below, there is a new function called "OrganiseEmails". This is run before the part of the code that actually does the attachment extraction to Google Drive. This will exclude any "individual" emails (i.e the ones with the subject [tinyCam....] and so it doesn't matter if both the single conversational threaded email and the new individual email have the same Gmail label applied. I've just tested the script and it still works.

Of course, you can always create a new script that is purely for breaking the single threaded emails into multiple ones. Then, have a 2nd script (i.e your script) do purely the attachment to Google Drive extraction. I just have both in the same script.

Here, you can see an example of what that looks like. The attachment from the thread is extracted and we create a new email with a new unique subject per email thread:

image

image

Also, I'm not a programmer by any means, I barely know Javascript and so pretend I'm a complete noob (which I am).

Here's the script, hope it helps.

Code.gs:

// Gmail2GDrive
//dropbox https://github.com/ahochsteger/gmail2gdrive
/**
 * Recursive function to create and return a complete folder path.
 */

/**
 * Returns the label with the given name or creates it if not existing.
 */
function getOrCreateLabel(labelName) {
  var label = GmailApp.getUserLabelByName(labelName);
  if (label == null) {
    label = GmailApp.createLabel(labelName);
  }
  return label;
}


function getOrCreateSubFolder(baseFolder,folderArray) {
  if (folderArray.length == 0) {
    return baseFolder;
  }
  var nextFolderName = folderArray.shift();
  var nextFolder = null;
  var folders = baseFolder.getFolders();
  while (folders.hasNext()) {
    var folder = folders.next();
    if (folder.getName() == nextFolderName) {
      nextFolder = folder;
      break;
    }
  }
  if (nextFolder == null) {
    // Folder does not exist - create it.
    nextFolder = baseFolder.createFolder(nextFolderName);
  }
  return getOrCreateSubFolder(nextFolder,folderArray);
}

/**
 * Returns the GDrive folder with the given path.
 */
function getFolderByPath(path) {
  var parts = path.split("/");

  if (parts[0] == '') parts.shift(); // Did path start at root, '/'?

  var folder = DriveApp.getRootFolder();
  for (var i = 0; i < parts.length; i++) {
    var result = folder.getFoldersByName(parts[i]);
    if (result.hasNext()) {
      folder = result.next();
    } else {
      throw new Error( "folder not found." );
    }
  }
  return folder;
}

/**
 * Returns the GDrive folder with the given name or creates it if not existing.
 */
function getOrCreateFolder(folderName) {
  var folder;
  try {
    folder = getFolderByPath(folderName);
  } catch(e) {
    var folderArray = folderName.split("/");
    folder = getOrCreateSubFolder(DriveApp.getRootFolder(), folderArray);
  }
  return folder;
}




/**
 * Processes a message
 */
function processMessage(message, rule, config) {
  Logger.log("INFO:       Processing message: "+message.getSubject() + " (" + message.getId() + ")");
  var messageDate = message.getDate();
  var attachments = message.getAttachments();
  
  
  var body = message.getBody();
  var rawc = message.getRawContent();
  var inlineImages = {};
  var imgTags = body.match(/<img[^>]+>/g) || []; // all image tags, embedded or by url
  
  for(var i = 0; i < imgTags.length; i++) {
    var realattid = imgTags[i].match(/realattid=(.*?)&/i); // extract the image cid if embedded
    if (realattid) { // image is inline and embedded
      var cid = realattid[1];
      var imgTagNew = imgTags[i].replace(/src="[^\"]+\"/,"src=\"cid:"+cid+"\""); // replace the long-source with just the cid
      body = body.replace(imgTags[i], imgTagNew); // update embedded image tag in message body
      var b64c1 = rawc.lastIndexOf(cid) + cid.length + 3; // first character in image base64
      var b64cn = rawc.substr(b64c1).indexOf("--") - 3; // last character in image base64
      var imgb64 = rawc.substring(b64c1, b64c1 + b64cn + 1); // is this fragile or safe enough?
      var imgblob = Utilities.newBlob(Utilities.base64Decode(imgb64), "image/jpeg", cid); // decode and blob
      inlineImages[cid] = imgblob;
    }
  }
  
  

  for (var attIdx=0; attIdx<attachments.length; attIdx++) {
    var attachment = attachments[attIdx];
    var attachmentName = attachment.getName();

    Logger.log("INFO:         Processing attachment: "+attachment.getName());
    var match = true;
    if (rule.filenameFromRegexp) {
    var re = new RegExp(rule.filenameFromRegexp);
      match = (attachment.getName()).match(re);
    }
    if (!match) {
      Logger.log("INFO:           Rejecting file '" + attachment.getName() + " not matching" + rule.filenameFromRegexp);
      continue;
    }
    try {
      var folder = getOrCreateFolder(Utilities.formatDate(messageDate, config.timezone, rule.folder));


     /////////////////////////////////////////////////////////////////////////////////////////////

      // var file = folder.removeFile(attachment);
      // file.setContent(attachment);


      var fileName = attachment.getName();
      var f = folder.getFilesByName(fileName);
      var file = f.hasNext() ? f.next() : folder.createFile(attachment);

      // file.setContent(attachment);

      /////////////////////////////////////////////////////////////////////////////////////////////



      if (rule.filenameFrom && rule.filenameTo && rule.filenameFrom == file.getName()) {
        var newFilename = Utilities.formatDate(messageDate, config.timezone, rule.filenameTo.replace('%s',message.getSubject()));
        Logger.log("INFO:           Renaming matched file '" + file.getName() + "' -> '" + newFilename + "'");
        file.setName(newFilename);
      }
      else if (rule.filenameTo) {
        var newFilename = Utilities.formatDate(messageDate, config.timezone, rule.filenameTo.replace('%s',message.getSubject()));
        Logger.log("INFO:           Renaming '" + file.getName() + "' -> '" + newFilename + "'");
        file.setName(newFilename);
      }
      file.setDescription("Mail title: " + message.getSubject() + "\nMail date: " + message.getDate() + "\nMail link: https://mail.google.com/mail/u/0/#inbox/" + message.getId());
      Utilities.sleep(config.sleepTime);
    } catch (e) {
      Logger.log(e);
    }
    
    
  
    try {
      var folder = getOrCreateFolder(Utilities.formatDate(messageDate, config.timezone, rule.folder));

     /////////////////////////////////////////////////////////////////////////////////////////////

      var fileName = inlineImages.getName();
      var f = folder.getFilesByName(fileName);
      var file = f.hasNext() ? f.next() : folder.createFile(inlineImages);

      /////////////////////////////////////////////////////////////////////////////////////////////

      if (rule.filenameFrom && rule.filenameTo && rule.filenameFrom == file.getName()) {
        var newFilename = Utilities.formatDate(messageDate, config.timezone, rule.filenameTo.replace('%s',message.getSubject()));
        Logger.log("INFO:           Renaming matched file '" + file.getName() + "' -> '" + newFilename + "'");
        file.setName(newFilename);
      }
      else if (rule.filenameTo) {
        var newFilename = Utilities.formatDate(messageDate, config.timezone, rule.filenameTo.replace('%s',message.getSubject()));
        Logger.log("INFO:           Renaming '" + file.getName() + "' -> '" + newFilename + "'");
        file.setName(newFilename);
      }
      file.setDescription("Mail title: " + message.getSubject() + "\nMail date: " + message.getDate() + "\nMail link: https://mail.google.com/mail/u/0/#inbox/" + message.getId());
      Utilities.sleep(config.sleepTime);
    } catch (e) {
      Logger.log(e);
    }    
    
    
    
  }
}

/**
 * Generate HTML code for one message of a thread.
 */
function processThreadToHtml(thread) {
  Logger.log("INFO:   Generating HTML code of thread '" + thread.getFirstMessageSubject() + "'");
  var messages = thread.getMessages();
  var html = "";
  for (var msgIdx=0; msgIdx<messages.length; msgIdx++) {
    var message = messages[msgIdx];
    html += "From: " + message.getFrom() + "<br />\n";
    html += "To: " + message.getTo() + "<br />\n";
    html += "Date: " + message.getDate() + "<br />\n";
    html += "Subject: " + message.getSubject() + "<br />\n";
    html += "<hr />\n";
    html += message.getBody() + "\n";
    html += "<hr />\n";
  }
  return html;
}

/**
* Generate a PDF document for the whole thread using HTML from .
 */
function processThreadToPdf(thread, rule, html) {
  Logger.log("INFO: Saving PDF copy of thread '" + thread.getFirstMessageSubject() + "'");
  var folder = getOrCreateFolder(rule.folder);
  var html = processThreadToHtml(thread);
  var blob = Utilities.newBlob(html, 'text/html');
  var pdf = folder.createFile(blob.getAs('application/pdf')).setName(thread.getFirstMessageSubject() + ".pdf");
  return pdf;
}

/**
 * Main function that processes Gmail attachments and stores them in Google Drive.
 * Use this as trigger function for periodic execution.
 */
function Gmail2GDrive() {
  
  OrganiseEmails();

  if (!GmailApp) return; // Skip script execution if GMail is currently not available (yes this happens from time to time and triggers spam emails!)
  var config = getGmail2GDriveConfig();
  var label = getOrCreateLabel(config.processedLabel); 
  
  var end, start;
  start = new Date(); // Start timer

  Logger.log("INFO: Starting mail attachment processing.");
  if (config.globalFilter===undefined) {
    config.globalFilter = "has:attachment -in:trash -in:drafts -in:spam";
  }

  // Iterate over all rules:
  for (var ruleIdx=0; ruleIdx<config.rules.length; ruleIdx++) {
    var rule = config.rules[ruleIdx];
    var gSearchExp  = config.globalFilter + " " + rule.filter;
   
    if (config.newerThan != "") {
      gSearchExp += " newer_than:" + config.newerThan;
    }
    var doArchive = rule.archive == true;
    var doPDF = rule.saveThreadPDF == true;

    // Process all threads matching the search expression:
    var threads = GmailApp.search(gSearchExp);
    Logger.log("INFO:   Processing rule: "+gSearchExp);
    for (var threadIdx=0; threadIdx<threads.length; threadIdx++) {
      var thread = threads[threadIdx];
      end = new Date();
      var runTime = (end.getTime() - start.getTime())/1000;
      Logger.log("INFO:     Processing thread: "+thread.getFirstMessageSubject() + " (runtime: " + runTime + "s/" + config.maxRuntime + "s)");
      if (runTime >= config.maxRuntime) {
        Logger.log("WARNING: Self terminating script after " + runTime + "s")
        return;
      }

      // Process all messages of a thread:
      var messages = thread.getMessages();
      for (var msgIdx=0; msgIdx<messages.length; msgIdx++) {
        var message = messages[msgIdx];
        processMessage(message, rule, config);
      }
      if (doPDF) { // Generate a PDF document of a thread:
        processThreadToPdf(thread, rule);
      }
      

      
      
      // Mark a thread as processed:
      //var rem_label = GmailApp.getUserLabelByName("gmail2drive"); 
      //rem_label.removeFromThread(threads[i]);
      thread.addLabel(label);
      
      thread.moveToTrash();
      
      if (doArchive) { // Archive a thread if required
        Logger.log("INFO:     Archiving thread '" + thread.getFirstMessageSubject() + "' ...");
        thread.moveToArchive();
      }
    }
  }
  end = new Date(); // Stop timer
  var runTime = (end.getTime() - start.getTime())/1000;
  Logger.log("INFO: Finished mail attachment processing after " + runTime + "s");
}





function OrganiseEmails() {

  //var threads = GmailApp.search("-in:trash -in:drafts -in:spam label:gmail2drive-wyzecam -label:to-gdrive-processed");
var threads = GmailApp.search("label:gmail2drive-wyzecam subject:(-[tinyCam]) -label:to-gdrive-processed -in:trash -in:drafts -in:spam");

  for (i in threads){
    var messages = threads[i].getMessages();
    var threadid = threads[i].getId();

    for (j in messages){
      if (messages[j].getAttachments().length > 0){  
        var to = messages[j].getTo();
        
        
        /* 
        https://sites.google.com/site/scriptsexamples/available-web-apps/form-publisher/documentation/form-and-template-edition/date-and-time-settings#TOC-Hours-and-Minutes
        */
        
        // var date = Utilities.formatDate(new Date(), "GMT","dd-MM-yyyy' at 'hh:mm:ss:SS' '");
        // var date = Utilities.formatDate(new Date(), "GMT+12","dd-MM-yyyy' at 'hh:mm a' seconds - 'ss:SS' timezone - 'z' '");
        // var date = Utilities.formatDate(new Date(), "Pacific/Auckland","dd-MM-yyyy' ----------- 'EEEE' ----------- 'hh:mm a' ----------- 'ss:SS' seconds ----------- 'Z' '");
        var date = Utilities.formatDate(new Date(), "Pacific/Auckland","dd-MM-yyyy' ----------- 'EEEE' ----------- 'hh:mm a' ----------- 'ss:SS' seconds '");
        
        var subject = "[tinyCam] motion detected - " + date;
        var body = "WyzeCam - Main House - Google App Script!";
        var attachment = messages[j].getAttachments()[0];

        var options = {
          attachments: attachment
        }
        GmailApp.sendEmail(to, subject, body, options);
      }      
    }
    var rem_label = GmailApp.getUserLabelByName("gmail2drive-wyzecam"); 
    //rem_label.removeFromThread(threads[i]);
    
    Gmail.Users.Threads.remove("me", threadid);
   // threads[i].moveToTrash();
  }  
}

Config.gs:

/**
 * Configuration for Gmail2GDrive
 * See https://github.com/ahochsteger/gmail2gdrive/blob/master/README.md for a config reference
 */
function getGmail2GDriveConfig() {
  return {
    // Global filter
    "globalFilter": "-in:trash -in:drafts -in:spam in:inbox",
    // Gmail label for processed threads (will be created, if not existing):
    "processedLabel": "to-gdrive/processed",
    // Sleep time in milli seconds between processed messages:
    "sleepTime": 100,
    // Maximum script runtime in seconds (google scripts will be killed after 5 minutes):
    "maxRuntime": 45,
    // Only process message newer than (leave empty for no restriction; use d, m and y for day, month and year):
    "newerThan": "3d",
    // Timezone for date/time operations:
    "timezone": "GMT+12",

    // Processing rules:
    "rules": [
      /* { // Store all attachments sent to [email protected] to the folder "Scans"
        "filter": "has:attachment to:[email protected]",
        "folder": "'Scans'-yyyy-MM-dd"
      },
      { // Store all attachments from [email protected] to the folder "Examples/example1"
        "filter": "has:attachment from:[email protected]",
        "folder": "'Examples/example1'"
      }, */


      { // Store all pdf attachments from [email protected] to the folder "Examples/example2"
        "filter": "label:important-stuffs-ipcam,-wemo-wyzecam-v2",
        "folder": "'Swann'",
        "filenameFromRegexp": ".*\.jpg$",
        "archive": true
      },


      // { // Store all attachments from [email protected] OR from:[email protected]
        // to the folder "Examples/example3ab" while renaming all attachments to the pattern
        // defined in 'filenameTo' and archive the thread.
        // "filter": "has:attachment (from:[email protected] OR from:[email protected])",
        // "folder": "'Examples/example3ab'",
        // "filenameTo": "'file-'yyyy-MM-dd-'%s.txt'",
        // "archive": true
      // },

      /* {
        // Store threads marked with label "PDF" in the folder "PDF Emails" als PDF document.
        "filter": "label:PDF",
        "saveThreadPDF": true,
        "folder": "PDF Emails"
      },
      { // Store all attachments named "file.txt" from [email protected] to the
        // folder "Examples/example4" and rename the attachment to the pattern
        // defined in 'filenameTo' and archive the thread.
        "filter": "has:attachment from:[email protected]",
        "folder": "'Examples/example4'",
        "filenameFrom": "file.txt",
        "filenameTo": "'file-'yyyy-MM-dd-'%s.txt'"
      } */

    ]
  };
}

chaoscreater avatar Jun 18 '22 12:06 chaoscreater

@chaoscreater I appreciate a lot you taking time to describe in detail the issue and your solution for it -although you don't use it anymore. It helps me a lot to track down the problem and find solutions for, so others can benefit from it!

ahochsteger avatar Jun 18 '22 12:06 ahochsteger

I'm closing this issue since GMail2GDrive is discontinued and has been superseded by the much improved Gmail Processor. It provides a way to process threads with multiple emails correctly and solves this issue. See the Getting Started Guide as well as the Reference Docs for more information

ahochsteger avatar Sep 20 '23 20:09 ahochsteger