rust-ar icon indicating copy to clipboard operation
rust-ar copied to clipboard

Entry `drop` function can take a long time

Open wavenator opened this issue 3 years ago • 6 comments

The current drop implementation for Entry is as follows:

impl<'a, R: 'a + Read> Drop for Entry<'a, R> {
    fn drop(&mut self) {
        if self.position < self.length {
            // Consume the rest of the data in this entry.
            let mut remaining = self.reader.take(self.length - self.position);
            let _ = io::copy(&mut remaining, &mut io::sink());
        }
    }
}

When dealing with large entries, this function can take a very long time. Could there be a reason for this implementation? Do we really need to consume the reader?

wavenator avatar Nov 23 '21 09:11 wavenator

This is necessary to seek the reader to the header of the next entry. There is no Seek bound, so reader.seek() can't be used.

bjorn3 avatar Nov 23 '21 10:11 bjorn3

I'm confused. I'm not familiar with the code but why don't we use Reader::seek here? Eventually, this drop implementation takes a lot of time when dealing with a lot of entries in a big AR file. Can't it be solved?

wavenator avatar Nov 23 '21 10:11 wavenator

reader.seek() requires a Seek bound. It could be added, but because it currently isn't, .seek() can't be used without a breaking change. A breaking change may be acceptable. At least it would be fine with me speaking as user of rust-ar.

bjorn3 avatar Nov 23 '21 10:11 bjorn3

@mdsteele WDYT about adding Seek bound to Reader?

wavenator avatar Nov 23 '21 11:11 wavenator

BTW what about calling consume?

wavenator avatar Nov 30 '21 10:11 wavenator

Yes, it was written that way to avoid needing a Seek bound. At the time I was trying to stay similar to the API of the tar crate, which also does not require a Seek bound. Once nice feature of not requiring Seek is that you can easily extract a compressed archive by using a reader adapter that provides streaming decompression (for example, GzipReader, which doesn't implement Seek). It'd be nice not to lose that capability.

I poked around just now, though, and it looks like tar recently landed a (not yet published) change to provide an alternate Archive::entries_with_seek method to allow opting in to a more efficient implementation when the underlying reader implements Seek. So that might be a good approach.

mdsteele avatar Dec 11 '21 14:12 mdsteele