Video-Hub-App icon indicating copy to clipboard operation
Video-Hub-App copied to clipboard

Improve File Hashing by Including `stats.blocks`

Open cal2195 opened this issue 4 years ago • 4 comments

I was thinking about the file hashing, and detecting file changes and I realised we should really be using stats.blocks(The number of blocks actually allocated for a file) in our hashing calculation... 🤔

As we don't hash the entire file for performance reasons, files downloaded in chunks can be missed by the app as changes, as their file size doesn't account for file sparseness! 🔥

A fix for this would be use stats.blocks which would be easy, but unfortunately this would break our current .vha2 files as hashes would change...

Unless we do something clever, by checking if stats.blocks * stats.blksize >= stats.size, and use stats.size in that case! 👍 That would preserve backwards compatibility! 🚀

https://nodejs.org/api/fs.html#fs_class_fs_stats

cal2195 avatar Mar 02 '20 18:03 cal2195

Since this change can be added, as you describe, without breaking backwards compatibility, it seems like we can add it whenever (and it doesn't need to be along with the version 3.0.0 release).

I'm so eager to get the new version out, I'd rather push off all features until later unless they need to be in 3.0.0

I'll keep the issue open. Do let me know if there's good reason to get it in with 3.0.0 rather than later.

whyboris avatar Oct 22 '20 22:10 whyboris

🤔 I'm still unsure what the benefit is ... I'm not really eager to add this feature 😓

I'm thinking of closing this -- please let me know if there's some benefit I'm not seeing 🤝

whyboris avatar Dec 17 '20 18:12 whyboris

Without this change, if files are imported as they're being downloaded (say you download directly to your video hub folder), VHA will import it as is, and will generate incorrect thumbnails, if it even can.

With this change, a new scan of the hub will detect that the video has been updated again (since fully downloading), and should reimport the thumbnails as it has a new hash now! 🎉

stats.blocks is the number of actual file system blocks used to store the file, which increases as the file grows. For speed, most file systems only allocate the blocks when needed, so a 1GB file might actually only be 1MB on disk, if only 1MB has been written to the file. In this case, stats.blocks will be a much lower number than a truly 1GB file.

Including this into the hash calculation will allow these changes to be detected by VHA.

cal2195 avatar Dec 18 '20 10:12 cal2195

I feel like it would take a lot of careful edge-case detection to implement. At the moment, I'm preventing the app from re-scanning files that have already been added to the app (for example: https://github.com/whyboris/Video-Hub-App/blob/main/node/main-extract-async.ts#L253)

If we attempt to re-scan files that have already been added, we'll need to run a stat command across every file is the library (it's over 9000 :trollface: ) which seems wasteful for the potential that some user might have pointed their app at the download directory.

I personally feel like it's a mistake to point VHA at a downloads-in-progress folder 🤷 but I won't judge 😝

I'm reluctant to add this feature because of the re-scanning it will require. Please let me know if you think there's a way around this problem, or it's not a problem 🤔

whyboris avatar Dec 18 '20 15:12 whyboris