lofty-rs
lofty-rs copied to clipboard
ParseOptions: ignore `Picture` setting
Consider the following situation:
You have 10,000
audio files you want to probe, totaling to around 500 albums:
for file in files {
let probe = lofty::Probe::new(file)
let tagged = probe.guess_file_type()?.read()?;
// Do stuff with TaggedFile.
}
Currently, this fully scans every file, including the Picture
bytes, meaning: every single file allocates a Vec
for the picture bytes, even if we have already scanned them, or don't want them in the first place (e.g, we only want the core tag metadata).
I'd like to avoid allocating bytes that I already have, so my proposal:
for file in files {
let options = lofty::ParseOptions::new().read_picture(false); // This doesn't exist.
let probe = lofty::Probe::new(file).options(options);
let tagged = probe.guess_file_type()?.read()?;
// Do stuff with TaggedFile (that doesn't have a Picture).
}
After testing around, this cuts probing time in half for files with non-trivial picture data (1200x1200+).
I'm willing to submit a PR for this, but I was wondering if this ParseOptions::read_picture()
API was the correct way forward, or if this should be done another way. It would add another branch each file parse since every version of read_from()
would have this added:
if parse_options.read_picture {
// Push picture bytes.
}
An addition to ParseOptions
would be the way to go about this. :)
I'm curious though, how did you go about testing this? When items are encountered, they get read into memory before being interpreted, so there would be allocations regardless.
This was commented out in the tests (it was a match before though).
https://github.com/Serial-ATA/lofty-rs/blob/46127a09122c71671ae196e530f0f804cd09ce51/src/flac/read.rs#L93-L97
I'm now seeing that the allocation is not even here. In my case, I'm iterating in a hot loop over 1000s of files so I guess the encoding + push time was adding up.
If this read were prevented in Block::read
, the bytes wouldn't even be allocated, right?
https://github.com/Serial-ATA/lofty-rs/blob/46127a09122c71671ae196e530f0f804cd09ce51/src/flac/block.rs#L32-L37
Something like a check before continuing after line 32 works but maybe is too invasive? Each Block::read
would have to take in a ParseOptions
and branch. It would have to return a signal to the calling read_from
, letting it know to continue
as well.
What is the right way to go about this? I'd like to make this easy to merge.
Yes, the allocation would be avoided completely if you just seek over the content in Block::read
.
A nice way you could go about this would be to make Block::read
take a closure:
let block = Block::read(reader, |block_type| {
block_type == BLOCK_ID_VORBIS_COMMENTS
|| (block_type == BLOCK_ID_PICTURE && parse_options.read_pictures)
});
Then just change Block::read
:
if predicate(block.ty) {
// Read block content
} else {
// Seek over content and return empty Vec
}
I will work on this and eventually submit an initial draft PR (probably in the next few weeks).
Sorry. I won't be working on this.
This should be a pretty easy feature to implement. I'll keep this open so I remember to eventually work on it.