Handle image pasted/dropped from various sources (external websites, base64-encoded, etc.)
When copy-pasting a local image into the editor, it will be uploaded as expected.
However, if you copy an image from another website and paste it into the editor, it will use the previous url as the source, rather than uploading it.
Is this the intended behaviour? This can sometimes cause issues with regards to hotlinking, dead links, security, etc.
There are multiple possible scenarios what we may get when someone pastes/drops an image from another website or application. It will depend on how someone copied that content and on the browser. We may get:
- HTML with base64-encoded images,
- HTML with images which
srcattributes point to an external website, - HTML with images which
srcare references to files in the clipboard'sdataTransfer.files, - just images (as
dataTransfer.files), - and I'm sure I missed something.
I haven't checked what browsers really do, so these are all theoretical scenarios. I expect to see most of them in real life :D
Actually, you even described this in the PR:
However, they will still not be triggered when drag + dropping an image from a website. This is because the drag event doesn't include file data, but text/uri-list data, which includes the image url(s). Similarly, images won't be uploaded if selecting and copying+dragging/pasting a bunch of html.
In these instances, due to potential CORS issues, you would need to have a server endpoint that downloads and saves images directly from these external urls.
Additionally, it may also be worthwhile having a config option to validate img src, to only insert them when the url matches certain patterns. This could be used to prevent re-uploading previously uploaded content, and restrict hotlinking from other websites.
So, this gets complex. We need configuration (this needs to be opt-in behaviour and we need src validation) and we'll need an endpoint to retrieve external images. Plus, we need to check what we really get from the browsers.
The answer, therefore, is that we skipped those cases for now. We planned to work on this in some, undefined future.
For now, I'm also unsure whether all these scenarios should be handled by one plugin or whether we should split this. There's a risk that this plugin will grow significantly. We should review which potential input options can be handled in similar way to what we do so far. We also need to define configuration in a way which forsee the other input options which may be implemented later.
Interestingly, for issue https://github.com/ckeditor/ckeditor5-upload/issues/5 , the upload works as expected with current version of ckeditor5 (at least on Windows Chrome/Edge from MSPaint).
For my use case at least, I wish to have all images uploaded because:
- They will need to be synced to other devices and accessible offline
- Images from free hosting providers often go dead, hit usage limits, prevent hotlinking, etc
- Including images from non-secure websites shows warnings
- Data-uri images slow down API calls, and sometimes fail to save if HTML output is too big to fit in database.
Most cases are handled by the current functionality (+ my PR). I will probably filter out any other cases with userland code.
For configuration, maybe it would be useful to let users provide a function, as it could get fairly complex with flags for every use case. For example:
import upload // promise, takes fileData, resolves to upload dest
import uploadFromUrl // promise, takes url resolves to upload dest
import convertDataURI // takes datauri, converts to blob data
// config - upload everything
function handleImage(type, data, url) {
switch (type) {
case 'file':
return upload(data);
case 'data-img':
return data ? upload(data) : upload(convertDataURI(url));
case 'url':
if (url.match(/^https:\/\/mydomain.com/) return url;
else return uploadFromUrl(url);
case 'webkit-fake-image':
throw new Error('Image upload not supported in Safari.')
}
}
// config - no uploads, only include small data-img
function handleImage(type, data, url) {
switch (type) {
case 'file':
return null;
case 'data-img':
if (url.length < 10000) return url;
else return null;
case 'url':
return url;
}
}
cc @pjasiun @wwalc
For configuration, maybe it would be useful to let users provide a function, as it could get fairly complex with flags for every use case. For example:
It'd actually need to be simpler because we can only expose one upload() function and should work with File instances, URLs and base64. That's because all these uploads need to go through our FileRepository.
So, a shouldUpload( type, data, url ) -> {Boolean} function should be enough. However, we may also need to extend UploadAdapter so it has a method for uploading files from external websites.
What worries me still is handling all of these in the clipboardInput's listener which you tried to patch. Without analysing all the possible input scenarios I'm not able to tell now how to handle them and whether we don't need something more.
Since a complete solution to this ticket is a longer-term project, I don't want to block https://github.com/ckeditor/ckeditor5-image/pull/212 with these discussions. However, it still requires some work to be merged.
For sure what we have now is an MVP for the image upload, there is much more to be handled.
4 years ago I did a research for CKEditor 4 and I learn that that paste and drop handling is very complex. You can:
- drop file,
- paste file,
- paste image (for instance from MS Paint),
- drop HTML with image,
- paste HTML with image,
- drop HTML image (only image without anything else),
- paste content from MS Word with an image.
Each browser has (had) some quirks, and you can get a very strange content. It was very often that there were not common patterns, for instance, the same case, on the same browser may work differently depending on the action you did (drop gives you different results than paste), on the OS (different format on Mac and on Windows), etc.
As @Reinmar said, you can get:
- HTML with base64-encoded images,
- HTML with images which src attributes point to an external website,
- file in
dataTransfer.files, - file in
dataTransfer.items, - might be more.
For sure sometimes you get mixed content: <img> in HTML and file in dataTransfer.
I am not sure if it should be split between multiple plugins since cases might be mixed. I am not sure about the PR above, because it covers only a small part of cases and I do not know if it will not break other. I am not sure how extensible API we will be able to provide. What I am sure is that we need to do a deep research to learn all cases we need to handle, and then we will be able to design a solution.
Just a note that adding one particular host/domain to the whitelist is not enough as many times companies have/use multiple domains/hosts.
Anyone heavily interested in this feature: please add 👍 to the first post.
It'd actually need to be simpler because we can only expose one upload() function and should work with File instances, URLs and base64. That's because all these uploads need to go through our FileRepository.
So, a shouldUpload( type, data, url ) -> {Boolean} function should be enough. However, we may also need to extend UploadAdapter so it has a method for uploading files from external websites.
In general to correctly handle the use case mentioned in the original report there would need to be the possibility to use at least two different upload functions depending on the context. The first upload function already handles uploading local images (also base64-encoded etc.).
The second function, to handle <img> elements with src pointing to external URLs, would have to work differently. Such upload function would need to connect to the server side endpoint and ask it to upload the image (providing just the URL of it) and return back the correct url. It has to be done this way due to CORS restrictions.
In theory, uploading images coming from other domains could happen at a different stage, for example when saving the document in the CMS database, together with other post-processing operations like XSS sanitization. This way the system would be more bulletproof as always someone may intercept the POST request and inject an image coming from external URL anyway.
On the other side, the sooner the user receives a feedback that an image could not be uploaded from external URL the better. There may be situations in which the server side script to fetch images will not work, for example in case of copying an image from a website where only logged in users can see resources or from some intranet domain. Taking this into consideration, making an attempt to make "a local copy" of an image as soon as it gets pasted into the editor makes more sense.
Does it make sens for you to make first step with sth simple like bool shouldUpload(type, data, url) and then eventually extend it?
(I just don't trust persistence of any external resources so prefer to have all images on my side)
I am afraid that simple bool shouldUpload(type, data, url) is not enough. To make it works we need to be able to handle upload in at least 3 format:
File,- Base64 data,
- URL to the external image.
We need to recognize and handle all cases properly to make this work, what makes this ticket quite complicated.
Lazy upload image:
- Query all image tags
- Check the "src"
- Upload images if not uploaded
@taocz, I didexactly what you suggested:
(async() => {
let files = [];
for (let img of images) {
let response = await fetch(img.src);
let blob = await response.blob();
files.push(new File([blob], 'image', response));
console.log(response);
}
editor.execute('imageUpload', { file: files } );
})();
Any updates/solutions? Trying to upload a pasted image, searching for any solution
I implemented the lazy upload mentioned by @taocz with Axios, here are the additional logics to go through for a newly created imageElement:
- Check if the image is on server, return if it is
- Download the image from internet
- Upload the image to server
const additionalUploadImage = (imageElement) => {
let url = imageElement.getAttribute('src')
// return if the image is already on server
if (url.includes(process.env.SERVER_ADDRESS)) {
return
}
;(async () => {
let filename = url.split('/').pop().split('#')[0].split('?')[0] // extracts image's filename, see https://stackoverflow.com/a/36756650/15630163
let response = await axios.get(url, {
responseType: 'blob',
})
let blob = response.data
let formData = new FormData()
formData.append('upload', blob, filename) // form name and file name can be customized
let uploadResult = await axios.post(
process.env.SERVER_ADDRESS + '/files/upload', // route can be customized
formData,
)
// replace the image's url if it's successfully uploaded
if (uploadResult.status === 200) {
imageElement._attrs.set('src', uploadResult.data.url)
}
})()
}
So, since these are lazy upload logics to run through, we'll execute these logics in the for loop that iterates through changed image elements in the doc.on( 'change', ... ) listener.
for (const imageElement of getImagesFromChangeItem(editor, item)) {
// Check if the image element still has upload id.
const uploadId = imageElement.getAttribute('uploadId')
// avoid executing our custom upload logic when:
// 1. the picture is being handled by original paste/click-to-upload logic, or
// 2. the image element is in graveyard, aka. just been deleted
if (!uploadId && !isInsertedInGraveyard) {
additionalUploadImage(imageElement)
}
...
Overall view here.
This solves dropping images from websites, however, I think the solution is not perfect at all, for
-
As mentioned by @wwalc above, CORS issues.
The second function, to handle
<img>elements withsrcpointing to external URLs, would have to work differently. Such upload function would need to connect to the server side endpoint and ask it to upload the image (providing just the URL of it) and return back the correct url. It has to be done this way due to CORS restrictions.In fact, I also get HTTP 403 sometimes when trying to request pictures from some websites, which means some images cannot be normally downloaded in the first place.
-
On the other hand, lazy loading here kind of disrupts the normal workflow of uploading images, for the original concept is that uploading of a image are only triggered by paste / drop events so that the uploading process will only be triggered once in a image's life cycle, while listening to changed image elements in the document is not 'image-oriented' and will trigger every time the editor refreshes.
So at the end, I'm moving this task to server side. Basically what I'm going to do is on every editor upload, let the server examine the document's raw HTML content, and
- Find out all
<img>elements - Download the pictures from the internet and replace the
srcof these<img>elements
Any news about this?
Looking into this issue, I believe that all buildings blocks are already there.
The paste handling skips any paste event that contains HTML. In a second step, the pasted HTML is checked for any base64-decoded images (or blobs) and those are uploaded separately.
The secret lies in the second step: If it would also target regular, non-uploaded images, this issue would be solved. The only problem is to recognize images that have not been uploaded already, because the paste source could be the editor itself or another instance that holds already uploaded images.
This may require a config callback function to let the implementer decide if the image should be uploaded but other than that this appears to be quite straightforward to the untrained eye.
Any news? 7 years and no changes on this issue 🥲
Hi, any news about this issue? this should be handled by now i think by ckeditor
Hi, any news on this after 7 years?