mamute icon indicating copy to clipboard operation
mamute copied to clipboard

Uploading files: filename should not be kept

Open MatejBalantic opened this issue 9 years ago • 6 comments

As a general rule with allowing user-uploaded file it is a good thing to never allow user to decide on the filename.

I suggest a random file name is generated (no extension needed anyway) and original filename is completely discarded. This has several great side-effect:

  • no longer need user act on duplicated-filename message, which is a bit funky anyway,
  • users can't upload and share links to files like movies and software directly as the mime-type can be forced to one of the whitelisted mimetypes and extension would be stripped away. Thus, even if user tricks the site to upload a disallowed binary, it would be very hard for him to use this link as a malware/illegal online resource

MatejBalantic avatar May 18 '15 16:05 MatejBalantic

@MatejBalantic, right now we are using the filename as the alt attribute (in case of an image for example), and I think that this is important for SEO of public websites. And I believe that is also important for SEO that different images have different alt values, so it makes sense to validate it.

Now, I didn't understand your concerns about the filename when serving them. I see the importance of validating the mime types that we host, but I see no harm on the filename. I think we should create another issue for this mime type problem, what do you think?

csokol avatar May 23 '15 00:05 csokol

I think the problem is that two users can use the same filename and it would conflict. We could save the filename, but save it with another name at the filesystem(a hash, maybe)

leocwolter avatar May 23 '15 01:05 leocwolter

At the FS I use the attachment id as the filename to avoid conflicts. I think @matej is talking about the filename used when serving the file: https://github.com/caelum/mamute/blob/master/src/main/java/org/mamute/controllers/AttachmentController.java#L85

Is that the problem, Matej?

On Fri, May 22, 2015 at 10:20 PM Leonardo Cesar Wolter < [email protected]> wrote:

I think the problem is that two users can use the same filename and it would conflict. We could save the filename, but save it with another name at the filesystem(a hash, maybe)

— Reply to this email directly or view it on GitHub https://github.com/caelum/mamute/issues/159#issuecomment-104810681.

csokol avatar May 23 '15 02:05 csokol

Hey,

Yes, I am talking about the filename used to serve the file from our servers.

Persisting file name for purposes other than URL generation is not problematic in my opinion. So, keeping the original file name to display it in the alt field seems reasonable.

I do on the other side have doubts about serving the uploaded file under its original file name. For example, for avatars, not forcing the .jpg extension and/or Content-type header would allow a malicious user to upload & hotlink a .exe file containing a virus from your website.

In my experience, mime checking is good but unreliable and doesn't protect you from abuses as it can be easily manipulated.

Solution for this would be a combination of anonymizing the URL to random hash or ID and than forcing the Content-Type header to a white-listed content types. Also some size limitations? For avatars it could also be very easily checked if they are an actual image or not.

p.s. incrementing IDs are easy solution but make it very easy to retrieve all attachments on the system. Not sure if this is a problem in this case at all.

MatejBalantic avatar May 23 '15 12:05 MatejBalantic

I guess this thread collided a bit with https://github.com/caelum/mamute/issues/158, but anyway. What I forgot to mention in the last post is the inconvenience of the user currently if he uses a file name that already exists when uploading an attachment. I guess users' attachments shouldn't be sharing the namespace, or, even better, filename shouldn't matter for reasons explained anyway.

MatejBalantic avatar May 23 '15 13:05 MatejBalantic

So we have three issues here:

  1. Protect mamute against malicious file uploads.
  2. The path for serving files should not use the incremental id as malicious clients could use it to download all the files hosted by the site.
  3. Uploaded file names validation. A filename from one question/answer should not conflict with a filename from other post.

What's your idea for (1)? Should we check the content-type header sent by the user during the upload? I think we can't trust the client, but we should validate during that moment to provide a good experience for normal users. After that, we could validate on the backend after storing the file with File. probeContentType and check a whitelist of allowed mime types. What do you think?

For (2) we could simply store a hash (md5+salt) of the filename concatenating the attachment id to guarantee its uniqueness.

For (3). I think this is not happening. Right now we are only validating if there are filenames repeated at the same question/answer. You can check the implementation in QuestionController and AttachmentsValidator: https://github.com/caelum/mamute/blob/master/src/main/java/org/mamute/controllers/QuestionController.java#L165 https://github.com/caelum/mamute/blob/master/src/main/java/org/mamute/validators/AttachmentsValidator.java#L25

I think we should close this issue and open two new issues for (1) and (2).

csokol avatar May 24 '15 22:05 csokol