mirrorbits
mirrorbits copied to clipboard
Allow reading FileInfo from a dummy file instead of the file itself
On our (see #58) server setup, the mirrorbits server is not serving any files and is only responsible for spreadding the load. Effectively, this means it only needs the files for checksum and size generation, which produces a lot of wasted storage.
we use https://github.com/Myself5/mirrorbits_dummycreator for the dummy creation, its effectively just a json table containing the file information and therefore saves us a lot of storage (5.8MB vs 217GB).
Used that setup for the past Week on our production servers without issues.
OK, so I adjusted the fmt calls, and allowed fallback to "normal" mode. For reading I added a basic check if the file is > 1MB (which no JSON file should ever be considering it contains only 5 values) and I ran it through gofmt -w scan/scan.go. For a custom reader I lack the knowledge (thats my first GO project ever), however, I wouldn't see what else we can check for.
Only thing that we might want to consider is removing the log in line 259, as on a hybrid repo that would be "intended" I'd guess.
Havn't run it in production yet either, will see if I get some time to do that on the weekend.
Havn't run it in production yet either, will see if I get some time to do that on the weekend.
I have yet no way to test it in production, therefore if you can do it, it would definitely speedup the merge.
Thanks again for your work :)
Allright, every requirement should be fullfilled now. I made sure that hybrid is working. The only close edge case I could image is, that someone uploads a json file with the exact same struct we use (lets say this: https://paste.myself5.de/d0ULubrZJL.json) in THAT case, the json load process ignores the second entry and just loads the values from the first entry. We might want to consider adding a fallback and moving to "normal" mode if a file like that gets detected, up to you.
EDIT: Moved our server to the hybrid mode now, will report back if we get any complaints. EDIT2: 3 Days into it, and not a single Error entry in the log so far. Seems to be stable.
Hello,
I don't understand the following statement:
The only close edge case I could image is, that someone uploads a json file with the exact same struct we use (lets say this: https://paste.myself5.de/d0ULubrZJL.json) in THAT case, the json load process ignores the second entry and just loads the values from the first entry.
Assuming we upload a file like the one I linked, we currently threat it as a dummy file, and read the information we need from the first entry in that json array. We should decide if we want to keep that behaviour, or if we want to treat the file as a "normal" file in case of more than one existing json entry.
Yes, indeed. Another option would be to name these dummy files with a specific extension. For instance, for a file named vlc-2.2.6-win32.exe that we want to distribute, mirrorbits can check if a file vlc-2.2.6-win32.exe.mdata exists. This file would of course contains the appropriate json. In addition doing hybrid repositories can be easier to manage since you can easily differentiate between complete files and metadata-only files.
Opinions?
why not store the metadata for each file in the DB?
There are also file formats to store file metadata. This is commonly used in backup software.
why not store the metadata for each file in the DB?
This would also allow O(1) or O(log n) access to the metadata and avoid directory traversals.