astroid [WIP] Set content id of attachments

The Content-ID is set to the file name.

In this way, when composing HTML message inline images can be included. E.g. when editing markup

![This is an inline image](cid:image.jpg)

will include image.jpg in the text. AFAIK there was no easy way to inline attached images before.

As this happens at send time, the preview will not display the image, though.

Nov 14 '18 09:11 davvil

David Vilar writes on November 14, 2018 10:45:

The Content-ID is set to the file name.

In this way, when composing HTML message inline images can be included. E.g. when editing markup
![This is an inline image](cid:image.jpg)
will include image.jpg in the text. AFAIK there was no easy way to inline attached images before.

As this happens at send time, the preview will not display the image, though.

Is there any reason we cannot make this work in the preview? This would be a great feature, but I am a bit reluctant to make the final message different from the previewed one.

Also, what is the spec for content_id? The file name might not fit into it always. Then we need some way to communicate the sanitized file name to the user.

Nov 15 '18 18:11 gauteh

I was actually surprised that it should be so easy...

I checked the rfc and actually the cid should be "globally unique". This could be accomplished e.g. by appending the message id to the file name, with an @-symbol as separator. But then the simple inlining will not work anymore without modifying the reference with the new cid. That being said, apparently this rule is not strictly followed. I actually took the "cid is filename" convetion from some mails that I have received. You can also look at this thread in stackoverflow. But I think we should follow the standard.

One thing that could ease the implementation of the name substitution, without having to parse the full markdown or html, is to define our own convention for indicating an inline image and do a simple substitution each time we encounter the prefix, e.g. specify ![caption](#@inline:image.jpg), but of course this could produce undesired substitutions in some edge cases. Is it possible with webkit to get a list of all the references in a document?

As for displaying the image in the preview, I haven't looked at the code yet. I'll try to get it working.

Nov 16 '18 08:11 davvil

BTW. What is your preference for PRs? I marked this one as [WIP] as it is clearly not ready for merging. Is this OK or do you prefer that I close it and create a new one when it is more mature?

Nov 16 '18 08:11 davvil

On Fri, Nov 16, 2018 at 9:50 AM David Vilar [email protected] wrote:

I was actually surprised that it should be so easy...

I checked the rfc https://tools.ietf.org/html/rfc2392 and actually the cid should be "globally unique". This could be accomplished e.g. by appending the message id to the file name, with an @-symbol as separator. But then the simple inlining will not work anymore without modifying the reference with the new cid. That being said, apparently this rule is not strictly followed. I actually took the "cid is filename" convetion from some mails that I have received. You can also look at this thread https://stackoverflow.com/questions/39577386/the-precise-format-of-content-id-header in stackoverflow. But I think we should follow the standard.

Nice, we probably have to do some escaping though, or perhaps this is done already both by GMime at set_content_id and at load when converted to HTML. In which case we might actually be good to go.

One thing that could ease the implementation of the name substitution, without having to parse the full markdown or html, is to define our own convention for indicating an inline image and do a simple substitution each time we encounter the prefix, e.g. specify , but of course this could produce undesired substitutions in some edge cases. Is it possible with webkit to get a list of all the references in a document?

What do you mean? Have a look in tvextension.cc for how I susbstitute the img src for cid's at the moment.

Nov 16 '18 08:11 gauteh

On Fri, Nov 16, 2018 at 9:51 AM David Vilar [email protected] wrote:

BTW. What is your preference for PRs? I marked this one as [WIP] as it is clearly not ready for merging. Is this OK or do you prefer that I close it and create a new one when it is more mature?

That's great, good to post it early so that the direction of the implementation can be discussed. There's also an work-in-progress label.

Nov 16 '18 08:11 gauteh

On Fri, Nov 16, 2018 at 9:54 AM Gaute Hope [email protected] wrote:

One thing that could ease the implementation of the name substitution, without having to parse the full markdown or html, is to define our own convention for indicating an inline image and do a simple substitution each time we encounter the prefix, e.g. specify , but of course this could produce undesired substitutions in some edge cases. Is it possible with webkit to get a list of all the references in a document?

What do you mean? Have a look in tvextension.cc for how I susbstitute the img src for cid's at the moment.

That is actually what I was looking for! The idea would then be to go through the document as you do there, detect all cid: and substitute with the new names. I'll try to have a go at it.

Nov 16 '18 09:11 davvil

On Fri, Nov 16, 2018 at 10:04 AM David Vilar [email protected] wrote:

On Fri, Nov 16, 2018 at 9:54 AM Gaute Hope [email protected] wrote:

One thing that could ease the implementation of the name substitution, without having to parse the full markdown or html, is to define our own convention for indicating an inline image and do a simple substitution each time we encounter the prefix, e.g. specify , but of course this could produce undesired substitutions in some edge cases. Is it possible with webkit to get a list of all the references in a document?

What do you mean? Have a look in tvextension.cc for how I susbstitute the img src for cid's at the moment.

That is actually what I was looking for! The idea would then be to go through the document as you do there, detect all cid: and substitute with the new names. I'll try to have a go at it.

Nice! but if GMime handles escaping and un-escpaing properly then it might not be necessary?

Nov 16 '18 09:11 gauteh

On Fri, Nov 16, 2018 at 10:21 AM Gaute Hope [email protected] wrote:

Nice! but if GMime handles escaping and un-escpaing properly then it might not be necessary?

The problem is not the escaping (which I hope GMime takes care of it, but I will check it). The problem is the global cid:

Suppose we want to attach and inline image.jpg. It will get a cid, say [email protected]. Now, the user specified in the markdown a link to cid:image.jpg which gets transformed into html as ''. We will then need to change it to ''. And that's where the "parsing html" problem comes into play.

Nov 16 '18 10:11 davvil

On Fri, Nov 16, 2018 at 11:01 AM David Vilar [email protected] wrote:

On Fri, Nov 16, 2018 at 10:21 AM Gaute Hope [email protected] wrote:

Nice! but if GMime handles escaping and un-escpaing properly then it might not be necessary?

The problem is not the escaping (which I hope GMime takes care of it, but I will check it). The problem is the global cid:

Suppose we want to attach and inline image.jpg. It will get a cid, say [email protected]. Now, the user specified in the markdown a link to cid:image.jpg which gets transformed into html as ''. We will then need to change it to ''. And that's where the "parsing html" problem comes into play.

OK, perhaps you can do it before the markdown processor step in ComposeMessage. Why do you need to add an @xxxx.astroid part?

Nov 16 '18 10:11 gauteh

That's the pain point. The rfc states that "Both message-id and content-id are required to be globally unique. That is [...] no different body parts will ever have the same Content-ID addr-spec.". That's why I was thinking of adding the message-id to make them unique.

To be honest, I don't really see the point, as I don't think anyone would reference some content-id independently of the message. But I assume we should stick to the rfc (although not all email clients do).

On Fri, Nov 16, 2018 at 11:41 AM Gaute Hope [email protected] wrote:

On Fri, Nov 16, 2018 at 11:01 AM David Vilar [email protected] wrote:

On Fri, Nov 16, 2018 at 10:21 AM Gaute Hope [email protected] wrote:

Nice! but if GMime handles escaping and un-escpaing properly then it might not be necessary?

The problem is not the escaping (which I hope GMime takes care of it, but I will check it). The problem is the global cid:

Suppose we want to attach and inline image.jpg. It will get a cid, say [email protected]. Now, the user specified in the markdown a link to cid:image.jpg which gets transformed into html as ''. We will then need to change it to ''. And that's where the "parsing html" problem comes into play.

OK, perhaps you can do it before the markdown processor step in ComposeMessage. Why do you need to add an @xxxx.astroid part?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/astroidmail/astroid/pull/597#issuecomment-439354257, or mute the thread https://github.com/notifications/unsubscribe-auth/AApuRamPl9u8ha_8HQhLx5ZG_ZM7ijuAks5uvpZcgaJpZM4YdXyZ .

Nov 16 '18 12:11 davvil

On Fri, Nov 16, 2018 at 1:14 PM David Vilar [email protected] wrote:

That's the pain point. The rfc states that "Both message-id and content-id are required to be globally unique. That is [...] no different body parts will ever have the same Content-ID addr-spec.". That's why I was thinking of adding the message-id to make them unique.

To be honest, I don't really see the point, as I don't think anyone would reference some content-id independently of the message. But I assume we should stick to the rfc (although not all email clients do).

Oh, right, I see. Well, we certainly do not rely on it in astroid. Whenever a message is forwarded or replied to the CIDs would have to be re-generated then (not that it matters much for us atm since we do not use the HTML content then).

Nov 16 '18 12:11 gauteh

On Fri, Nov 16, 2018 at 1:48 PM Gaute Hope [email protected] wrote:

On Fri, Nov 16, 2018 at 1:14 PM David Vilar [email protected] wrote:

Oh, right, I see. Well, we certainly do not rely on it in astroid. Whenever a message is forwarded or replied to the CIDs would have to be re-generated then (not that it matters much for us atm since we do not use the HTML content then).

True, I didn't think of that. But let's go one step at a time :-)

Nov 16 '18 15:11 davvil

Is this one ready for review? Or are you still working on it?

Jan 02 '19 14:01 gauteh

No, it's not ready. I wanted to work on it but didn't find the time yet. In practice "it works", but it does not conform with the specification.

On Wed, 2 Jan 2019, 15:49 Gaute Hope <[email protected] wrote:

Is this one ready for review? Or are you still working on it?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/astroidmail/astroid/pull/597#issuecomment-450882103, or mute the thread https://github.com/notifications/unsubscribe-auth/AApuRc5pmr-Ey_KSgWoIVDZHZjbQzDY9ks5u_MbkgaJpZM4YdXyZ .

Jan 04 '19 09:01 davvil

Just an idea: Why not generate a new message id for the cid everytime a file gets attached? The postfix of the cid will differ from the actual mid only in the timestamp, but sending a mail should result in unique mids at any point in time, so the cid should also be unique with this approach. Replying to a message with attachments could be done the same: Generate a message id when the user hits "reply" and just use that.

Feb 20 '20 04:02 ff2000

I think that's ok, but it should be easy to refer to those cid's in a markdown email. It might be difficult to guess those when auto-generated?

May 11 '20 07:05 gauteh

astroid astroid copied to clipboard

[WIP] Set content id of attachments

astroid
astroid copied to clipboard