File descriptors lifetime bound to query not to message
I am currently trying to iterate through all new mail with that matches tag:new and for each of those messages I am applying a set of processing rules to add/remove tags.
While doing that initially with a lot of email I did run into troubles since the program was running out of file descriptors.
It seems like the notmuch::Query struct dictates the lifetime of those. That isn't ideal if you want to iterate through a million mails.
The following code illustrates when it happens:
{
let query = db.create_query("tag:new").unwrap();
for message in query.search_messages().unwrap() {
// so far no file descriptor has been created, it only is allocated once you try to access the headers (and potential other fields)
let list_id = message.header("List-Id"); // this allocates a FD and fails once the FD limit has been reached
} // FD should be free'ed here / after each iteration of the body / whenever `message` goes out of scope
}
// they are only free'ed when `query` runs out of scope
Could you provide an example of how this would be remedied in C?
I just wrote the following short C program and it does open the file, read the list header and close the file. It also counts the mails found and how many list-id's it was able to read. I've successfully ran this against 100k mail. ~99k of which had list-id's extracted. ulimit -n shows 1024.
#include <stdio.h>
#include <notmuch.h>
int main(int argc, char* argv[]) {
notmuch_database_t* database;
if (argc < 2) {
printf("usage:\n\t%s <search-term>\n", argc > 0 ? argv[0] : "<bin>");
return 1;
}
notmuch_status_t rc = notmuch_database_open("/home/andi/Maildir", NOTMUCH_DATABASE_MODE_READ_ONLY, &database);
if (rc != NOTMUCH_STATUS_SUCCESS) {
return 1;
}
notmuch_query_t * query = notmuch_query_create(database, argv[1]);
notmuch_messages_t* messages;
notmuch_message_t* message;
unsigned int count_total = 0;
unsigned int count_list_id = 0;
for (rc = notmuch_query_search_messages(query, &messages);
rc == NOTMUCH_STATUS_SUCCESS &&
notmuch_messages_valid(messages);
notmuch_messages_move_to_next(messages))
{
message = notmuch_messages_get(messages);
if (message == NULL)
break; // OOM
count_total++;
const char* header = notmuch_message_get_header(message, "List-Id");
if (header != NULL && header[0] != '\0') {
printf("list-id: %s\n", header);
count_list_id++;
} else if (header == NULL) {
printf("failed to get header\n");
}
notmuch_message_destroy(message);
}
printf("total: %u\n", count_total);
printf("list_id: %u\n", count_list_id);
notmuch_query_destroy(query);
notmuch_database_close(database);
return 0;
}
Thanks, I'll look into this soon. Probably early next week.