axel icon indicating copy to clipboard operation
axel copied to clipboard

Optimize writes to filesystem

Open iho opened this issue 8 years ago • 9 comments

Hi!

Thanks for this useful and excellent utile!

Can you implement in-memory buffer option to allow not write so often to disk?

iho avatar Jul 31 '16 14:07 iho

Hi @iho,

Have you tried playing with the buffer_size option in the axelrc file?

It's not particularly well-documented, but this is the one:

https://github.com/eribertomota/axel/blob/master/doc/axelrc.example#L65-L70

The axelrc file itself is mentioned in the manpage:

https://github.com/eribertomota/axel/blob/master/man/axel.txt#L93-L96

sdt avatar Jul 31 '16 22:07 sdt

I suspect the default settings for a lot of these configuration parameters were chosen 15+ years ago.

It may be worth re-examining some of these from a 2016 perspective. 5kb seems rather small.

sdt avatar Jul 31 '16 22:07 sdt

Have you tried playing with the buffer_size option in the axelrc file?

No.

@sdt thanks for help!

iho avatar Aug 01 '16 11:08 iho

I set but it still change file every second.

cat ~/.axelrc
num_connections = 10
buffer_size = 512000000

iho avatar Aug 01 '16 11:08 iho

I've had a bit of a look into this.

What's happening is that tcp_read tries to read a full buffer's worth, but returns only what the underlying socket read returns (I was getting 1440 bytes - most likely the MTU). This is then written to disk.

You're right - it doesn't matter how big you make the buffer - once you hit the MTU limit, that effectively becomes the buffer size.

There's at least two things that would need to be done to fix this:

  1. replace the global shared buffer in axel.c by per-connection buffers
  2. rather than write out each little piece that comes in, only write the buffers as they fill up

Care would need to be taken with the second part to make sure buffers get flushed correctly, in particular when saving the state of a partial transfer.

What problems are these frequent writes causing for you @iho ?

sdt avatar Aug 08 '16 11:08 sdt

I don't think axel should optimize the write operations, the filesystem is probably already taking care of this. I don't believe that each write corresponds to a real write to disk.

ordex avatar Sep 02 '17 14:09 ordex

Whether every write corresponds to "physical" disk activity depends on a wide variety of OS- and environment-specific concerns and is nearly impossible to determine. With sufficient abstraction layers (e.g. virtualization, RAID controllers) you cannot even guarantee a physical write to disk. Unfortunately, the opposite (the goal here) can be tricky.

An easier way to implement this (for certain values of "easier"...) might be to go off in a different direction and use mmap(2). On every platform I'm aware of, writes to mmap'ed memory translates directly to sparse writes, completely avoiding the need for lseek(2) and manual offset management. And as long as you avoid fsync (et al.) those writes should all be buffered at the kernel's convenience.

There are many obvious downsides, of course - most notably that 32-bit platforms would be limited to downloading 2GB files (and in reality not even quite that big). Also unusable for files where the length is not known in advance. Also makes resuming harder, etc., etc., etc. so I don't actually recommend doing this, just pointing out an obvious way to eliminate the write-per-write behaviour.

athompso avatar Nov 18 '18 16:11 athompso

@athompso is right; that's the direction we should go.

We don't need to map the whole files, just the chunk we're working on, and it would be a good idea to limit that to some smaller (configurable) size, e.g. 64MB.

It's not a problem for unknown sizes, we just need to call ftruncate when finished.

ismaell avatar Feb 12 '19 13:02 ismaell

Hi folks, I just let you know this related topic : https://unix.stackexchange.com/questions/742308/horrible-wget-download-speed-to-cifs-mount-local-download-and-cp-to-share-is-fi/753130#753130

in short : using axel with default settings to download on a mounted CIF (smb) drive has poor performances supposedely because of buffer small size.

eregnier avatar Aug 03 '23 10:08 eregnier