jarchivelib
jarchivelib copied to clipboard
Add Archiver.stream(ArchiveStream) support
Debian archive files (.deb) are an "ar" file containing a set of tar.gz files.
I'd love to be able to pass the ArchiveStream from the "ar" into a new Archiver.stream() call to extract a file from the embedded tar.gz.
I'm not familiar enough with Streams to know if that would work. Archiver.stream() appears to take a File only.
Thoughts?
i've never thought much about nesting archive streams, it's certainly not possible in any way with the current API.
but it sounds like a fun thing to implement. on first glance i think this will involve some wrapping mechanism of the stream
in CommonsArchiveEntry
. it also means extending ArchiveEntry
. i have some time on my hands tomorrow, i'll have a look at it.
actually, this should work
Archiver arArchiver = ArchiverFactory.createArchiver("ar");
Archiver tarGzArchiver = ArchiverFactory.createArchiver("tar", "gz");
ArchiveStream stream = arArchiver.stream(new File("/home/thomas/bar.ar"));
ArchiveEntry entry;
while ((entry = stream.getNextEntry()) != null) {
if (entry.getName().endsWith(".tar.gz")) {
tarGzArchiver.extract(stream, new File("/tmp/")); // will extract the contents of the nested archive
// will close the stream! see #52
}
}
stream.close();
the problem in the current version is that Archiver.extract(InputStream, File)
will close the input stream. therefore any subsequent calls to the stream
will throw an IOException. you can extract one file. which is pretty stupid i admit.
i'll fix #52 and deploy a snapshot
you can use 0.8.0-SNAPSHOT
to try it
Wow, thanks for the speedy response. I've been thinking about this over the weekend realised now that my request was very poorly worded.
Inside the second stream, I am streaming out the file to a ByteArrayOutputStream
, which means I may not even need to go near the filesystem. I know I did say "extract" but I am really streaming it.
My current code looks like this:
@Rule
public TemporaryFolder folder = new TemporaryFolder();
@Test
public void getControlFileAsStringTest() throws IOException {
File controlTarGz = getControlTarGzFromDeb(new File("src/test/resources/build-essential_11.6ubuntu6_amd64.deb"),
folder.getRoot());
String controlFileContents = getControlFromControlTarGz(controlTarGz);
System.out.println(controlFileContents);
}
public File getControlTarGzFromDeb(File debFile, File tmpLocation) throws IOException {
Archiver archiver = ArchiverFactory.createArchiver(ArchiveFormat.AR);
ArchiveStream stream = archiver.stream(debFile);
ArchiveEntry entry;
File controlTarGzFile = null;
while((entry = stream.getNextEntry()) != null) {
// access each archive entry individually using the stream
// or extract it using entry.extract(destination)
// or fetch meta-data using entry.getName(), entry.isDirectory(), ...
System.out.println(entry.getName());
if (entry.getName().equals("control.tar.gz")){
controlTarGzFile = entry.extract(tmpLocation);
}
}
stream.close();
return controlTarGzFile;
}
public String getControlFromControlTarGz(File controlTarGzFile) throws IOException {
Archiver archivertgz = ArchiverFactory.createArchiver(ArchiveFormat.TAR, CompressionType.GZIP);
ArchiveStream stream = archivertgz.stream(controlTarGzFile);
ArchiveEntry entry;
ByteArrayOutputStream baos= new ByteArrayOutputStream();
while((entry = stream.getNextEntry()) != null) {
if (entry.getName().equals("./control")){
IOUtils.copy(stream, baos);
}
}
return baos.toString( StandardCharsets.UTF_8.toString() );
}
Ultimately, it would be really nice to stream from the entry. Something like this:
public String getControlStringFromArFile(File arFile) throws IOException {
Archiver archiverAr = ArchiverFactory.createArchiver(ArchiveFormat.AR);
Archiver archivertgz = ArchiverFactory.createArchiver(ArchiveFormat.TAR, CompressionType.GZIP);
ArchiveStream stream = archiverAr.stream(arFile);
ArchiveEntry entry, entry2;
ByteArrayOutputStream baos= new ByteArrayOutputStream();
while((entry = stream.getNextEntry()) != null) {
// The ar contains a tgz file named control.tar.gz
if (entry.getName().equals("control.tar.gz")) {
ArchiveStream stream2 = archivertgz.stream(entry);
while((entry2 = stream2.getNextEntry()) != null) {
//The control.tar.gz contains a text file named control
if (entry2.getName().equals("./control")){
IOUtils.copy(stream2, baos);
}
}
}
}
return baos.toString( StandardCharsets.UTF_8.toString() );
}
In the mean time, I'll test out your changes.
the point of jarchivelib is to make it convenient to handle archives as File
objects. for what you are trying to achieve, i would suggest using commons-compress directly 1, as they already have an excellent archive/compression stream API, which jarchivelib only makes use of.
Thanks Thomas for taking the time to respond. I will look into that. It's just that your API is much nicer to deal with ;-) Sorry to have wasted your time. I do appreciate your help thus far.
The thing that is completely missing in commons-compress
is restoring unix file mode while extracting archive from stream.
ZipFile
resolve file mode from archive and in this library a good job was done for restoring file mode with help of FileModeMapper
class.
At the same time restoring unix file mode from ZipArchiveInputStream
won't be possible, because that simply skip entire Central Directory Record
.
This way it seems own re-implementation of ZipArchiveInputStream
is required.
Being implemented that would be really great feature, I haven't managed to find any library that could extract from stream and restore unix file mode at the same time.
(The main idea here is to iterate through entities and then after all entities finished, to parse additional files information from Central Directory Record
. That information might be used for restoring file mode even after all files were extracted already)
thanks for the input!
I've been mulling over file permissions for a while, and they're tricky because a) not all archive formats support them properly, which makes it hard to generalize. b) java's support for portable file permissions isn't very good, and will require resorting to hacks like the FileModeMapper
.
i have some ideas for a major release, but the API will be completely new, and i don't plan to fiddle with them in 0.x.x