mirrorbits icon indicating copy to clipboard operation
mirrorbits copied to clipboard

Allow reading FileInfo from a dummy file instead of the file itself

Open Myself5 opened this issue 8 years ago • 9 comments
trafficstars

On our (see #58) server setup, the mirrorbits server is not serving any files and is only responsible for spreadding the load. Effectively, this means it only needs the files for checksum and size generation, which produces a lot of wasted storage.

we use https://github.com/Myself5/mirrorbits_dummycreator for the dummy creation, its effectively just a json table containing the file information and therefore saves us a lot of storage (5.8MB vs 217GB).

Used that setup for the past Week on our production servers without issues.

Myself5 avatar Oct 03 '17 15:10 Myself5

OK, so I adjusted the fmt calls, and allowed fallback to "normal" mode. For reading I added a basic check if the file is > 1MB (which no JSON file should ever be considering it contains only 5 values) and I ran it through gofmt -w scan/scan.go. For a custom reader I lack the knowledge (thats my first GO project ever), however, I wouldn't see what else we can check for.

Only thing that we might want to consider is removing the log in line 259, as on a hybrid repo that would be "intended" I'd guess.

Havn't run it in production yet either, will see if I get some time to do that on the weekend.

Myself5 avatar Oct 17 '17 19:10 Myself5

Havn't run it in production yet either, will see if I get some time to do that on the weekend.

I have yet no way to test it in production, therefore if you can do it, it would definitely speedup the merge.

Thanks again for your work :)

etix avatar Oct 19 '17 09:10 etix

Allright, every requirement should be fullfilled now. I made sure that hybrid is working. The only close edge case I could image is, that someone uploads a json file with the exact same struct we use (lets say this: https://paste.myself5.de/d0ULubrZJL.json) in THAT case, the json load process ignores the second entry and just loads the values from the first entry. We might want to consider adding a fallback and moving to "normal" mode if a file like that gets detected, up to you.

EDIT: Moved our server to the hybrid mode now, will report back if we get any complaints. EDIT2: 3 Days into it, and not a single Error entry in the log so far. Seems to be stable.

Myself5 avatar Oct 19 '17 12:10 Myself5

Hello,

I don't understand the following statement:

The only close edge case I could image is, that someone uploads a json file with the exact same struct we use (lets say this: https://paste.myself5.de/d0ULubrZJL.json) in THAT case, the json load process ignores the second entry and just loads the values from the first entry.

etix avatar Oct 31 '17 10:10 etix

Assuming we upload a file like the one I linked, we currently threat it as a dummy file, and read the information we need from the first entry in that json array. We should decide if we want to keep that behaviour, or if we want to treat the file as a "normal" file in case of more than one existing json entry.

Myself5 avatar Oct 31 '17 10:10 Myself5

Yes, indeed. Another option would be to name these dummy files with a specific extension. For instance, for a file named vlc-2.2.6-win32.exe that we want to distribute, mirrorbits can check if a file vlc-2.2.6-win32.exe.mdata exists. This file would of course contains the appropriate json. In addition doing hybrid repositories can be easier to manage since you can easily differentiate between complete files and metadata-only files.

Opinions?

etix avatar Oct 31 '17 12:10 etix

why not store the metadata for each file in the DB?

darix avatar Jan 13 '20 16:01 darix

There are also file formats to store file metadata. This is commonly used in backup software.

ott avatar Sep 17 '22 02:09 ott

why not store the metadata for each file in the DB?

This would also allow O(1) or O(log n) access to the metadata and avoid directory traversals.

ott avatar Sep 17 '22 02:09 ott