imessage-exporter
imessage-exporter copied to clipboard
Organize messages and attachments by date
Copying all messages to single files in an output directory and all attachments to an output subdirectory can leave a very large number of files in just two directories. This can slow down directory listings, backups, and file loading.
Instead, messages could be broken up by date. Maybe something like: /YYYY/mm/dd/contact/YYYY-mm-dd_contact.format
.
Attachments can be referenced by multiple chats, so they should probably keep the same directory format that they start with (/Attachments/xx/xx/uuid-ish/name.format
), but they can be aliased or symlinked in to the same directory as the chat file, maybe named according to the timestamp that they were sent (/YYYY/mm/dd/contact/attachment/YYYY-mm-dd_HHMMSS.format
)
I don't like tucking away the content like this; I like how right now the conversations show up in a nice list of files. I don't think that is going to change.
I don't think that attachments are duplicated or reused in the table. Even if they are, every time imessage-exporter
sees a message with an attachment, it makes a copy (if enabled, of course). I don't think that is worth changing either, due to #61.
As you saw in the other tickets, I am already working on building a directory structure to shrink the size of the Attachments
root.
I found a handful of legitimately deduplicated attachments.
$ grep -r 'Library/Messages/Attachments' txt | sed 's/.*\.txt://' | wc -l
18480
$ grep -r 'Library/Messages/Attachments' txt | sed 's/.*\.txt://' | sort -u | wc -l
18457
It probably doesn't amount to a huge amount, but it's at least confirmation that it does happen.
Yeah, it does look like it can happen:
SELECT attachment_id, COUNT(*) FROM message_attachment_join GROUP BY attachment_id HAVING COUNT(*) > 1

SELECT * FROM message_attachment_join WHERE attachment_id = 60638

I still don't think that this matters for imessage-exporter
since we create a copy whenever we see an attachment. This is a cool finding though and definitely something to be aware of when using the messages database.
I only have those two samples to go by, but this appears to happen if you copy and paste from one conversation into the other.