HuggingFaceModelDownloader icon indicating copy to clipboard operation
HuggingFaceModelDownloader copied to clipboard

refactor: improve download performance on networked drives by pre-allocating instead of using temp files

Open lxe opened this issue 1 year ago • 0 comments

Improve download performance and progress reporting

Problem

The previous implementation used temporary files for each chunk of a download, which were later merged into the final file. This approach had several issues:

  1. Extra disk I/O from writing to temp files and then copying to the final location
  2. Poor performance on network drives due to unnecessary file operations
  3. Inaccurate download speed reporting due to how progress was tracked
  4. Wasted disk space from temporary files

Solution

This PR refactors the download mechanism to:

  1. Write directly to the final file using WriteAt() with correct offsets
  2. Pre-allocate the full file size to reduce fragmentation
  3. Track actual bytes downloaded per chunk for accurate progress reporting
  4. Eliminate the merge step entirely

Key Changes

  • Added downloadProgress struct to properly track bytes read per chunk
  • Modified downloadChunk to write directly to the target file using offsets
  • Improved progress monitoring to calculate actual download speed
  • Removed temporary file handling and merging code
  • Added elapsed time check to prevent division by zero in speed calculation

Benefits

  • Faster downloads, especially on network drives
  • Lower disk I/O
  • More accurate progress reporting
  • Reduced disk space usage
  • Better handling of large files

lxe avatar Dec 26 '24 00:12 lxe