ideas icon indicating copy to clipboard operation
ideas copied to clipboard

Unzip file online, on the fly with URI

Open Kreijstal opened this issue 3 years ago • 18 comments

Project description

A web app where you give the link of a resource that has a .zip file, and it sends the directory listing, not the file, and you can click those to browse zip files on the fly, on the browser, without downloading it. for example the file is at http://example.com/file.zip, so the url would be something like http://domain/decompress/http://example.com/file.zip and if you want to view a file in the zip it would be used like this http://domain/decompress/http://example.com/file.zip/README.txt

An alternative would be a javascript only solution. that does this as well, but the burden would be on the client. Edit: I just found this stackexchange question, too. https://superuser.com/questions/1286364/extract-files-from-online-tar-archive-using-only-its-url

Relevant Technology

Here is for example a proxy I forked in glitch.com https://glitch.com/~sofetchpost this creates directories for ftp servers, see http://sofetchpost.glitch.me/ftp://ftp.usf.edu/pub/

Complexity and required time

Complexity

  • [x] Beginner - This project requires no or little prior knowledge of the technolog(y|ies) specified to contribute to the project

Required time (ETA)

  • [x] Little work - A couple of days

Categories

  • [x] Web app

Kreijstal avatar Jan 24 '21 15:01 Kreijstal

The problem is that you want to do all this work in the browser to avoid a DoS because of the expensive tasks of unzipping large files. I don't know of any maintained libraries that allow you to unzip inside the browser either, even then, there could be client side security concerns if they submit a URL, not upload a file into the DOM.

TheOtterlord avatar Jan 24 '21 16:01 TheOtterlord

Here's your solution : https://stuk.github.io/jszip/ (https://github.com/Stuk/jszip) What a coincidence, I just started a project that uses this feature.

KaKi87 avatar Jan 24 '21 16:01 KaKi87

Hmm, this solves the browser side problem, and I guess if you make sure to display the appropriate warnings to the user, it would be a valid root to take an application. Thanks for sharing :smile:. @Kreijstal just making sure, this isn't suppose to be 100% secure & safe? Just a way to preview a zip file from a trusted source :thinking:?

TheOtterlord avatar Jan 24 '21 16:01 TheOtterlord

Hmm, this solves the browser side problem, and I guess if you make sure to display the appropriate warnings to the user, it would be a valid root to take an application. Thanks for sharing . @Kreijstal just making sure, this isn't suppose to be 100% secure & safe? Just a way to preview a zip file from a trusted source ?

It's a convenience tool, sometimes you just want to preview the content of a zip file without downloading anything.

Here's your solution : https://stuk.github.io/jszip/ (https://github.com/Stuk/jszip) What a coincidence, I just started a project that uses this feature.

Hmm, do you know any file browser demo, that's perfect, now everything else has to be implemented. Check this out https://observablehq.com/@scax/zip-file-to-json (But of course instead of uploading the file, you would just give the url to it)

What would be a bit hard to do is to create a browsable index of the zip file. Say, there is a pdf file inside the zip file, and you want to preview it, what would be the best way to preview it inside the browser, maybe create a blob-url, and just link the file on the interface?

Suddenly, I think this is way more work, than just doing it server-side.

The problem is that you want to do all this work in the browser to avoid a DoS because of the expensive tasks of unzipping large files. I don't know of any maintained libraries that allow you to unzip inside the browser either, even then, there could be client side security concerns if they submit a URL, not upload a file into the DOM.

I guess there should be a limit on how big the zip files are, but, it's open source, so people decide themselves what their limit is. And which sources do they want to accept. I think just listing the index of the zip, doesn't take too many resources.

Kreijstal avatar Jan 24 '21 17:01 Kreijstal

Here is https://github.com/Infocatcher/ArchView

It is an addon for old Firefox having XUL addons, but it is completely possible to port it to WebExtensions API.

KOLANICH avatar Jan 24 '21 17:01 KOLANICH

It's a convenience tool, sometimes you just want to preview the content of a zip file without downloading anything.

Yeah, that's fair. It should allow us to just use the browser approach then. Although, the browser will still download the file, just to local memory.

What would be a bit hard to do is to create a browsable index of the zip file. Say, there is a pdf file inside the zip file, and you want to preview it, what would be the best way to preview it inside the browser, maybe create a blob-url, and just link the file on the interface?

I can't see why not. Makes sense to eliminate storing things server side.

Suddenly, I think this is way more work, than just doing it server-side.

Yes, but the problem is that you have to account for security of any file stored server-side which brings in authentication, you also have to protect against DoS attacks etc..

I guess there should be a limit on how big the zip files are, but, it's open source, so people decide themselves what their limit is. And which sources do they want to accept. I think just listing the index of the zip, doesn't take too many resources.

It's not so much the size that is the problem. Take a zip bomb for example. This is a reletively small file that when unzipped can take a very long amount of time, use up disk space and memory to the point where servers can be brought down. We would need anti-virus protection server side along with the things I mentioned above.

Overall, if you are creating an application for use in your team/group and only that group, you will probably be fine as long as you require authentication. Otherwise, you need to add so much protection against DoS and privacy that it all becomes a big headache.

TheOtterlord avatar Jan 24 '21 17:01 TheOtterlord

It's not so much the size that is the problem. Take a zip bomb for example. This is a reletively small file that when unzipped can take a very long amount of time, use up disk space and memory to the point where servers can be brought down. We would need anti-virus protection server side along with the things I mentioned above.

Wow, that's pretty cool, but in that case size is the problem, while decompressing, for example. Maybe use a lazy library that can decompress the zip file, and be cancelled at anytime asynchronously? Well, suddenly this is really tricky! And not for beginners. https://research.swtch.com/zip oh this article is amazing. Yeah, I think you need a pretty clever unzipping algorithm if you want to have security in mind, maybe create another project at that? then this one just would use that one as a dependency.

Kreijstal avatar Jan 24 '21 17:01 Kreijstal

Maybe use a lazy library that can decompress the zip file, and be cancelled at anytime asynchronously?

Interesting idea, though this definitely takes the project to the next level and requires far much more knowledge than using an existing library.

Yeah, I think you need a pretty clever unzipping algorithm if you want to have security in mind, maybe create another project at that? then this one just would use that one as a dependency.

Yeah, although I still think that a client side unzip would be easier overall, and requires less maintanence.

maybe create a blob-url, and just link the file on the interface?

BTW, I found a nice stack overflow answer that deals with this. Just for reference if we do go client side.

TheOtterlord avatar Jan 24 '21 18:01 TheOtterlord

Also, yeah it's a good article

TheOtterlord avatar Jan 24 '21 18:01 TheOtterlord

@Kreijstal

What would be a bit hard to do is to create a browsable index of the zip file. Say, there is a pdf file inside the zip file, and you want to preview it, what would be the best way to preview it inside the browser, maybe create a blob-url, and just link the file on the interface?

I think it is possible to first determine the mime type of all the files and then giving dynamic preview options to the user. I stumbled upon https://github.com/gildas-lormeau/zip.js and it works like a charm. I might end up creating a proper browser-based web app. I will update you if I do so...

0xAliRaza avatar Mar 27 '21 14:03 0xAliRaza

I'm working on the project and trying to do all the work on the client-side but now I'm facing the CORS issue. I think I'll have to create a proxy.

0xAliRaza avatar Mar 31 '21 06:03 0xAliRaza

I'm working on the project and trying to do all the work on the client-side but now I'm facing the CORS issue. I think I'll have to create a proxy.

I created http://sofetchpost.glitch.me/ for this use case, you use it like this http://sofetchpost.glitch.me/https://cdn.discordapp.com/attachments/773243062431514646/822169399376216114/loesung01-merged.pdf

Kreijstal avatar Mar 31 '21 08:03 Kreijstal

Alright, I'll make use of it then.

0xAliRaza avatar Mar 31 '21 13:03 0xAliRaza

Alright, I'll make use of it then.

you can view the source at https://glitch.com/edit/#!/sofetchpost

Kreijstal avatar Apr 01 '21 21:04 Kreijstal

It might also be possible to first use a HEAD request to get the file size, and then a HTTP range request to just get the zip file index at the end, without downloading the actual contents. Then file listings of huge zip files would be viewable much faster than currently.

jjrv avatar May 24 '21 18:05 jjrv

It might also be possible to first use a HEAD request to get the file size, and then a HTTP range request to just get the zip file index at the end, without downloading the actual contents. Then file listings of huge zip files would be viewable much faster than currently.

that's quite fascinating, probably good for the server health, but I guess you would have to implement your own zip parsing library or modify an existing one. Which might notch up the difficulty+time investment quite a bit. But it'd be totally worth it.

Kreijstal avatar May 24 '21 18:05 Kreijstal

Parsing the zip file index is way easier than getting the actual contents, because the index is not compressed. Using a full-blown existing zip library is a waste actually. My first impression is that this task is actually almost trivially easy and would result in a powerful tool with no existing alternatives.

However I haven't run into a situation where this is needed, otherwise I would have done it. I already wrote a library for creating zip files in the browser with no compression, just to be able to save several files at once. It was very easy to do, much much smaller and faster than any existing library, but again for a very specific use case others probably simply haven't run into, otherwise they would have made it first.

My library is here: https://github.com/requirex/codec/blob/master/packages/zip/src/index.ts

In my opinion there is very little code there, and yet it produces valid zip files. General decompressing is way harder, but reading just the index is even easier.

I think the biggest issue is CORS. It would be vulnerable to DoS and cost something to proxy these requests through a web server, and most public zip files probably don't have CORS headers allowing a random web app to request them, making a practical service impossible.

I do have a possible solution though, by using a bookmarklet. So you could navigate to any page in the target domain, click the bookmarklet to open a popup window, paste the zip file URL there, and then view the contents without having to download the file. The bookmarklet can work cross-browser, and the request won't be cross domain.

I'm kind of interested in making this, but still have no use for it. Do you?

I expect the project would take maximum 2 days, with an ugly non-styled file listing. Decoding a single file inside the package, to view its contents, would involve creating a new single-file zip in memory and feed it to an existing library. That is also pretty trivial. The fun thing is, you could for example view a readme file inside a huge zip file, quickly even over a slow network connection. Again due to how zip files work, a range request can read a single file inside a large compressed archive, and the file index contains the necessary offset info.

CSP severely restricts adding any new scripts to web pages, so to work on all pages, the bookmarklet would have to contain the entire code of this tool. That probably means it won't have a fancy UI, and will be restricted to saving generated single-file zip packages to the user's machine. So it would be possible to view the listing of a huge online zip, locate readme.txt inside, and then download just that file as readme.txt.zip. Any existing tool can then be used to view that file. Alternatively, the user could drag that file to another web app that shows the file contents. Maybe the clipboard could also be used for smaller files.

jjrv avatar May 24 '21 19:05 jjrv

Hi Everyone, This might be of interest in simplifying the need for the entire zip archive into just the specific file requested.

https://github.com/gtsystem/python-remotezip

atharapos avatar Jul 01 '21 09:07 atharapos