Write symbolic links to disk
What is the use case for the feature?
Whenever a symbolic link is encountered, it is not written to disk because it is considered an empty file. This can lead to false negatives of a file being found when unpacking and also does not maintain the integrity of the original compressed package. The user should have the ability to resolve and follow the symbolic link as seen fit
Does the feature contain any proprietary information about another company's intellectual property?
No
How would you implement this feature?
Whenever writing files to disk, if an empty file is encountered, first check to see if it's a symbolic link. If it is, write the link to disk
Are there any (reasonable) alternative approaches?
Are you interested in implementing it yourself?
resource.get_children() does not return symbolic links
Hi @csisl! I'd love to help, but could use some additional info.
- Can you give some more information about your system (OS, processor architecture, Docker/non-Docker environment, etc.)?
- Can you give more information about the binary that is not unpacking symbolic links?
OFRAK is definitely supposed to handle symbolic links, so I want to make sure we address this, but I'm having trouble replicating this locally.
I made the following test file using these commands on Debian: csisl-test.tar.gz
mkdir -p /tmp/test
cd /tmp/test/
echo "Hello, world!" > hello.txt
touch empty.txt
ln -s hello.txt link.txt
ln -s self-link.txt self-link.txt
ln -s link.txt second-link.txt
ln -s nonexist.txt dead.txt
cd ..
tar -czvf csisl-test.tar.gz test/
When I unpack in the OFRAK GUI, it looks like this. The resource.get_children method must necessarily return symbolic links for them to be displayed in the GUI, so I wonder if they are not being unpacked correctly in whatever binary you are testing.
Modifying "world" in hello.txt to "GitHub" and repacking gives this file: csisl-test-2.tar.gz
You can test the process yourself by running this script generated by the GUI.
from ofrak import *
from ofrak.core import *
async def main(ofrak_context: OFRAKContext, root_resource: Optional[Resource] = None):
if root_resource is None:
root_resource = await ofrak_context.create_root_resource_from_file(
"csisl-test.tar.gz"
)
await root_resource.unpack()
genericbinary_0x0 = await root_resource.get_only_child(
r_filter=ResourceFilter(
tags={GenericBinary},
attribute_filters=[
ResourceAttributeValueFilter(attribute=Data.Offset, value=0)
],
)
)
await genericbinary_0x0.unpack()
folder_test = await genericbinary_0x0.get_only_child(
r_filter=ResourceFilter(
tags={Folder},
attribute_filters=[
ResourceAttributeValueFilter(
attribute=AttributesType[FilesystemEntry].Name, value="test"
)
],
)
)
file_hello_txt = await folder_test.get_only_child(
r_filter=ResourceFilter(
tags={File},
attribute_filters=[
ResourceAttributeValueFilter(
attribute=AttributesType[FilesystemEntry].Name, value="hello.txt"
)
],
)
)
config = StringFindReplaceConfig(
to_find="world",
replace_with="GitHub",
null_terminate=False,
allow_overflow=True,
)
await file_hello_txt.run(StringFindReplaceModifier, config)
await root_resource.pack_recursively()
await root_resource.flush_to_disk("csisl-test-2.tar.gz")
if __name__ == "__main__":
ofrak = OFRAK()
ofrak.run(main)
Unpacking that repacked file again in OFRAK shows that the symbolic links are still there, so OFRAK can (at least in this case) handle unpacking and repacking symbolic links.
I also tried the following small script to flush a symbolic link to disk.
from ofrak import *
from ofrak.core import *
async def main(ofrak_context: OFRAKContext):
root = await ofrak_context.create_root_resource_from_file("csisl-test.tar.gz")
await root.unpack()
tar = await root.get_only_child()
await tar.unpack()
folder_test = await tar.get_only_child()
# Shows symbolic link children
print(list(await folder_test.get_children()))
file_dead_txt = await folder_test.get_only_child(
r_filter=ResourceFilter(
tags={FilesystemEntry},
attribute_filters=[
ResourceAttributeValueFilter(
attribute=AttributesType[FilesystemEntry].Name, value="dead.txt"
)
],
)
)
# Successfully writes, but writes an empty file
await file_dead_txt.flush_to_disk("test_dead.txt")
o = OFRAK()
o.run(main)
This script runs without issue and creates test_dead.txt, but it creates an empty file instead of a symbolic link. Is this the problem you're identifying?
root@ofrak:/tmp/test# stat test_dead.txt
File: test_dead.txt
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: 37h/55d Inode: 54802789 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2023-09-06 17:31:59.317037124 +0000
Modify: 2023-09-06 17:31:59.317037124 +0000
Change: 2023-09-06 17:31:59.317037124 +0000
Birth: 2023-09-06 17:31:59.317037124 +0000
root@ofrak:/tmp/test# stat second-link.txt
File: second-link.txt -> link.txt
Size: 8 Blocks: 0 IO Block: 4096 symbolic link
Device: 37h/55d Inode: 54802762 Links: 1
Access: (0777/lrwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2023-09-06 17:12:27.417313186 +0000
Modify: 2023-09-06 17:12:17.877449957 +0000
Change: 2023-09-06 17:12:17.877449957 +0000
Birth: 2023-09-06 17:12:17.877449957 +0000
Based on my current understanding of the issue you're reporting (that Resource.flush_to_disk for FilesystemEntry-tagged resources doesn't work properly), I've opened PR #373 to address the problem.
Does that fix the issue you are describing?
I still can't replicate resource.get_children not returning symbolic links.
Hello! Thank you for the response. As for my environment, I am running with the following setup:
- macos M1 / ARM
- ofrak 3.2.0
- python 3.8.10
- not using docker
- inside of a virtual env
I followed the steps above that you used to create a directory with one text file and then several symbolic links.
mkdir -p /tmp/test
cd /tmp/test/
echo "Hello, world!" > hello.txt
touch empty.txt
ln -s hello.txt link.txt
ln -s self-link.txt self-link.txt
ln -s link.txt second-link.txt
ln -s nonexist.txt dead.txt
cd ..
tar -czvf csisl-test.tar.gz test/
Whenever I do this and run ofrak on the command line, the only file that is preserved is the hello.txt file. At the end of each line for the empty files it says (not written).
% ofrak unpack test-issue.tar.gz -r
[ ofrak_cli.py: 173] No disassembler backend specified, so no disassembly will be possible
Unpacking file: test-issue.tar.gz
Extracting data to test-issue.tar.gz_extracted_20230907073332
┌test-issue.tar.gz: [attributes=(AttributesType[FilesystemEntry], Magic), size=290 bytes, extracted-path=test-issue.tar.gz_extracted_20230907
073332/test-issue.tar.gz]
└───┬TarArchive: [attributes=(Data, Magic), size=5120 bytes, extracted-path=test-issue.tar.gz_extracted_20230907073332/test-issue.tar.gz.ofra
k_children/TarArchive]
└───┬test-issue: [attributes=(Data, AttributesType[FilesystemEntry]), size=0 bytes, (not written)]
├────test-issue/dead.txt: [attributes=(Data, AttributesType[FilesystemEntry], AttributesType[SymbolicLink]), size=0 bytes, (not written)]
├────empty.txt: [attributes=(Data, AttributesType[FilesystemEntry], Magic), size=0 bytes, (not written)]
├────hello.txt: [attributes=(Data, AttributesType[FilesystemEntry], Magic), size=12 bytes, extracted-path=test-issue.tar.gz_extracted
_20230907073332/test-issue.tar.gz.ofrak_children/TarArchive.ofrak_children/test-issue.ofrak_children/hello.txt]
├────test-issue/link.txt: [attributes=(Data, AttributesType[FilesystemEntry], AttributesType[SymbolicLink]), size=0 bytes, (not written)]
├────test-issue/second-link.txt: [attributes=(Data, AttributesType[FilesystemEntry], AttributesType[SymbolicLink]), size=0 bytes, (not written)]
└────test-issue/self-link.txt: [attributes=(Data, AttributesType[FilesystemEntry], AttributesType[SymbolicLink]), size=0 bytes, (not written)]
It took 0.043 seconds to run the OFRAK script
A file listing:
ls -la test-issue.tar.gz_extracted_20230907073332/test-issue.tar.gz.ofrak_children/TarArchive.ofrak_children/test-issue.ofrak_children
total 8
drwxr-xr-x 3 wheel 96 Sep 7 07:33 .
drwxr-xr-x 3 wheel 96 Sep 7 07:33 ..
-rw-r--r-- 1 wheel 12 Sep 7 07:33 hello.txt
This is the behavior I'm seeing whenever I run resource.unpack() from a script as well, which is where my main issue lies.
As for the get_children() call.. This is what I'm seeing:
await resource.unpack()
children = await resource.get_children()
for child in children:
await child.identify()
caption = child.get_caption()
print(caption)
While I can see the symbolic links in the GUI
I cannot see the symbolic links that point to files whenever I iterate over the children:
What's interesting is I haven't tried with a dead link until following your instructions above. As of now, it is only recognizing the dead link, not the ones that point to the file hello.txt.
Currently, we've merged #373, so it should be possible to dump symbolic links via the following (assuming you've installed OFRAK from the master branch of the source repo, instead of pip).
if resource.has_tag(FilesystemEntry):
entry = await resource.view_as(FilesystemEntry)
await entry.flush_to_disk()
Working on a separate PR to make the ofrak unpack CLI command use this method instead of the (now renamed) resource.flush_data_to_disk method.
@rbs-jacob, can this issue be closed?