notmuch-rs icon indicating copy to clipboard operation
notmuch-rs copied to clipboard

File descriptors lifetime bound to query not to message

Open andir opened this issue 6 years ago • 3 comments

I am currently trying to iterate through all new mail with that matches tag:new and for each of those messages I am applying a set of processing rules to add/remove tags.

While doing that initially with a lot of email I did run into troubles since the program was running out of file descriptors.

It seems like the notmuch::Query struct dictates the lifetime of those. That isn't ideal if you want to iterate through a million mails.

The following code illustrates when it happens:

{
  let query = db.create_query("tag:new").unwrap();
  for message in query.search_messages().unwrap() {
      // so far no file descriptor has been created, it only is allocated once you try to access the headers (and potential other fields)

      let list_id = message.header("List-Id"); // this allocates a FD and fails once the FD limit has been reached

  } // FD should be free'ed here / after each iteration of the body / whenever `message` goes out of scope
}
// they are only free'ed when `query` runs out of scope

andir avatar Feb 19 '20 22:02 andir

Could you provide an example of how this would be remedied in C?

vhdirk avatar Feb 29 '20 09:02 vhdirk

I just wrote the following short C program and it does open the file, read the list header and close the file. It also counts the mails found and how many list-id's it was able to read. I've successfully ran this against 100k mail. ~99k of which had list-id's extracted. ulimit -n shows 1024.

#include <stdio.h>
#include <notmuch.h>

int main(int argc, char* argv[]) {
        notmuch_database_t* database;

        if (argc < 2) {
                printf("usage:\n\t%s <search-term>\n", argc > 0 ? argv[0] : "<bin>");
                return 1;
        }

        notmuch_status_t rc = notmuch_database_open("/home/andi/Maildir", NOTMUCH_DATABASE_MODE_READ_ONLY, &database);

        if (rc != NOTMUCH_STATUS_SUCCESS) {
                return 1;
        }

        notmuch_query_t * query = notmuch_query_create(database, argv[1]);
        notmuch_messages_t* messages;
        notmuch_message_t* message;


        unsigned int count_total = 0;
        unsigned int count_list_id = 0;

        for (rc = notmuch_query_search_messages(query, &messages);
             rc == NOTMUCH_STATUS_SUCCESS &&
             notmuch_messages_valid(messages);
             notmuch_messages_move_to_next(messages))
        {
                message = notmuch_messages_get(messages);

                if (message == NULL)
                        break; // OOM

                count_total++;


                const char* header = notmuch_message_get_header(message, "List-Id");
                if (header != NULL && header[0] != '\0') {
                        printf("list-id: %s\n", header);
                        count_list_id++;
                } else if (header == NULL) {
                        printf("failed to get header\n");
                }

                notmuch_message_destroy(message);
        }

        printf("total: %u\n", count_total);
        printf("list_id: %u\n", count_list_id);


        notmuch_query_destroy(query);

        notmuch_database_close(database);

        return 0;
}

andir avatar Mar 02 '20 12:03 andir

Thanks, I'll look into this soon. Probably early next week.

vhdirk avatar Mar 13 '20 18:03 vhdirk