Greyhole icon indicating copy to clipboard operation
Greyhole copied to clipboard

0-bytes files

Open gboudreau opened this issue 4 years ago • 8 comments

@old-square-eyes says:

I just noticed a lot of my files are 0kb. I'm pretty upset. I found this. Not sure if it's the cause or some race condition somewhere using backup software to SAMBA shares in relation to removing locked files before they could be written. In any case @gboudreau, I ran a fsck with checksum and don't have email configured. I can't see the report in /usr/share/greyhole/ as suggested here. Any ideas where I find the report?

What's the best way to run a scan to see if there are good copies of the zero kb files (without actually browsing through the /gh directories file by file?

gboudreau avatar May 21 '20 12:05 gboudreau

The fsck report is saved in /usr/share/greyhole/fsck_work_log.dat, in a machine readable format. You can put this code in a print.php file:

<?php

class FSCKWorkLog {
    const FILE = '/usr/share/greyhole/fsck_work_log.dat';
    
    public static function print() {
        $fsck_work_log = static::getFromDisk();
        echo json_encode($fsck_work_log, JSON_PRETTY_PRINT) . "\n";
    }

    private static function getFromDisk() {
        return unserialize(file_get_contents(static::FILE));
    }
}

FSCKWorkLog::print();

Then execute it with php print.php to see a somewhat readable version of that file.

Of note: greyhole fsck works by making sure all file copies are consistent with what is available on the shares. i.e. it assumes the file copies used as the target of the symlinks in your LZ are the correct copies, and if it finds other file copies that are different, it will overwrite those file copies with the content of the 'real' file. That means if some software "emptied" some files through the Samba shares, all the file copies will have been emptied.

I would suggest running greyhole --debug="ShareName/path/to/a/now/empty/file" to see what happened with this file. (You can paste the result here if you'd like me to have a look, or send it to [email protected]) Also, I would be interested in looking at the /var/log/greyhole.log* files you have, to debug this further. This information would also be relevant; please execute this command, and send me the link it gives you.

Unless you turned it off, Greyhole uses a .gh_trash folder, to keep old copies of modified and delete files. You can find this folder inside each of your storage pool drives. eg. /mnt/hdd1/gh/.gh_trash Maybe the files that are now 0 bytes could be found, in their previous (non-0-bytes) state in there? You can check with this command (assuming all your storage pool drives are /mnt/hdd*/gh/):

ls -l /mnt/hdd*/gh/.gh_trash/ShareName/"path/to/a/now/empty/file"

gboudreau avatar May 21 '20 12:05 gboudreau

Hi there. Thanks for creating the issue. I had tried that file with a json viewer/beautifier but had not yet had success. The php viewer works better. Note: 1gb drive below is misnamed it's 1TB.

Looks like I don't have file mismatched sizes.

php print.php output.txt

{
    "dir": "all shares",
    "options": null,
    "tasks": [
        {
            "id": 274574,
            "status": "complete",
            "report": {
                "start": 1590055427,
                "what": "\/var\/hda\/files\/drives\/1gbDisk\/Movies",
                "metastore": {
                    "num_dirs": 0,
                    "num_files": 0
                },
                "orphaned": {
                    "num_orphans": 0
                },
                "landing_zone": {
                    "num_dirs": 1,
                    "num_files": 4
                },
                "no_copies_found_files": [],
                "symlink_target_moved": 0,
                "too_many_copies": 0,
                "too_many_files": [],
                "missing_copies": 0,
                "wrong_file_size": [],
                "temp_files": [],
                "gone_ok": 0,
                "end": 1590055428
            }
        },
        {
            "id": 274575,
            "status": "complete",
            "report": {
                "start": 1590055428,
                "what": "\/var\/hda\/files\/drives\/1gbDisk\/Music",
                "metastore": {
                    "num_dirs": 0,
                    "num_files": 0
                },
                "orphaned": {
                    "num_orphans": 0
                },
                "landing_zone": {
                    "num_dirs": 1379,
                    "num_files": 4522
                },
                "no_copies_found_files": [],
                "symlink_target_moved": 0,
                "too_many_copies": 0,
                "too_many_files": [],
                "missing_copies": 0,
                "wrong_file_size": [],
                "temp_files": [],
                "gone_ok": 0,
                "end": 1590055667
            }
        },
        {
            "id": 274576,
            "status": "complete",
            "report": {
                "start": 1590055667,
                "what": "\/var\/hda\/files\/drives\/1gbDisk\/Pictures",
                "metastore": {
                    "num_dirs": 0,
                    "num_files": 0
                },
                "orphaned": {
                    "num_orphans": 0
                },
                "landing_zone": {
                    "num_dirs": 248,
                    "num_files": 42114
                },
                "no_copies_found_files": [],
                "symlink_target_moved": 0,
                "too_many_copies": 0,
                "too_many_files": [],
                "missing_copies": 0,
                "wrong_file_size": [],
                "temp_files": [],
                "gone_ok": 0,
                "end": 1590055919
            }
        },
        {
            "id": 274577,
            "status": "complete",
            "report": {
                "start": 1590055919,
                "what": "\/var\/hda\/files\/drives\/1gbDisk\/TV",
                "metastore": {
                    "num_dirs": 0,
                    "num_files": 0
                },
                "orphaned": {
                    "num_orphans": 0
                },
                "landing_zone": {
                    "num_dirs": 1,
                    "num_files": 4
                },
                "no_copies_found_files": [],
                "symlink_target_moved": 0,
                "too_many_copies": 0,
                "too_many_files": [],
                "missing_copies": 0,
                "wrong_file_size": [],
                "temp_files": [],
                "gone_ok": 0,
                "end": 1590055920
            }
        },
        {
            "id": 274578,
            "status": "complete",
            "report": {
                "start": 1590055920,
                "what": "\/var\/hda\/files\/drives\/1gbDisk\/Videos",
                "metastore": {
                    "num_dirs": 0,
                    "num_files": 0
                },
                "orphaned": {
                    "num_orphans": 0
                },
                "landing_zone": {
                    "num_dirs": 1,
                    "num_files": 8
                },
                "no_copies_found_files": [],
                "symlink_target_moved": 0,
                "too_many_copies": 0,
                "too_many_files": [],
                "missing_copies": 0,
                "wrong_file_size": [],
                "temp_files": [],
                "gone_ok": 0,
                "end": 1590055920
            }
        },
        {
            "id": 274579,
            "status": "complete",
            "report": {
                "start": 1590055920,
                "what": "\/var\/hda\/files\/drives\/1gbDisk\/Software",
                "metastore": {
                    "num_dirs": 0,
                    "num_files": 0
                },
                "orphaned": {
                    "num_orphans": 0
                },
                "landing_zone": {
                    "num_dirs": 676,
                    "num_files": 1626
                },
                "no_copies_found_files": [],
                "symlink_target_moved": 0,
                "too_many_copies": 0,
                "too_many_files": [],
                "missing_copies": 0,
                "wrong_file_size": [],
                "temp_files": [],
                "gone_ok": 0,
                "end": 1590055930
            }
        },
        {
            "id": 274580,
            "status": "complete",
            "report": {
                "start": 1590055930,
                "what": "\/var\/hda\/files\/drives\/1gbDisk\/MarkGH",
                "metastore": {
                    "num_dirs": 0,
                    "num_files": 0
                },
                "orphaned": {
                    "num_orphans": 0
                },
                "landing_zone": {
                    "num_dirs": 231,
                    "num_files": 2175
                },
                "no_copies_found_files": [],
                "symlink_target_moved": 0,
                "too_many_copies": 0,
                "too_many_files": [],
                "missing_copies": 0,
                "wrong_file_size": [],
                "temp_files": [],
                "gone_ok": 0,
                "end": 1590055942
            }
        }
    ]
}

old-square-eyes avatar May 21 '20 20:05 old-square-eyes

I'll work on your other instructions now. Most of these were seldom accessed family photos, so I have long since emptied the normal GH trash. The 0kb issue may have happened some time ago.

FYI the software I used (for years), which supports Samba shares is Syncback Free. The only other thing I did (other than copy the files in the first place and occasionally sync in new DIRs) was rename folders on occasion. Some of these might have been top level folders that spawned a big rename set of tasks.

I did have an older version of GH that got stuck in a fsck loop too. It was months until an update came out to fix that bug and allow a fsck to complete.

old-square-eyes avatar May 21 '20 20:05 old-square-eyes

Email sent with curl -Ls https://bit.ly/gh-infos | sudo sh info.

Examples of 0kb files. There are perhaps a couple of hundred throughout the different Samba/Greyhole shares alongside complete files.

image

old-square-eyes avatar May 21 '20 20:05 old-square-eyes

greyhole --debug="/var/hda/files/drives/1gbDisk/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG"
Debugging file operations for file named "/var/hda/files/drives/1gbDisk/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG"

From DB
=======

From logs
=========
gzip: /var/log/greyhole.log*.gz: No such file or directory
Jan 05 15:56:52 DEBUG fsck: Found /var/hda/files/drives/500blue/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
Jan 05 15:56:52 DEBUG fsck: Found /var/hda/files/drives/1gbDisk/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
Jan 05 15:56:52 DEBUG fsck: Loading metafiles for Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG ...
Jan 05 15:56:52 DEBUG fsck:   Got 2 metadata files.
Jan 05 16:25:13 DEBUG fsck: Found /var/hda/files/drives/500blue/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
Jan 05 16:25:13 DEBUG fsck: Found /var/hda/files/drives/1gbDisk/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
Jan 05 16:25:13 DEBUG fsck: Loading metafiles for Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG ...
Jan 05 16:25:13 DEBUG fsck:   Got 2 metadata files.
Jan 05 17:08:38 DEBUG fsck: Found /var/hda/files/drives/500blue/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
Jan 05 17:08:38 DEBUG fsck: Found /var/hda/files/drives/1gbDisk/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
Jan 05 17:08:38 DEBUG fsck: Loading metafiles for Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG ...
Jan 05 17:08:38 DEBUG fsck:   Got 2 metadata files.
Jan 25 11:02:31 DEBUG fsck: Found /var/hda/files/drives/500blue/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
Jan 25 11:02:31 DEBUG fsck: Found /var/hda/files/drives/1gbDisk/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
Jan 25 11:02:31 DEBUG fsck: Loading metafiles for Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG ...
Jan 25 11:02:31 DEBUG fsck:   Got 2 metadata files.
Apr 19 18:56:00 DEBUG fsck: Found /var/hda/files/drives/500blue/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
Apr 19 18:56:00 DEBUG fsck: Found /var/hda/files/drives/1gbDisk/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
Apr 19 18:56:00 DEBUG fsck: Loading metafiles for Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG ...
Apr 19 18:56:00 DEBUG fsck:   Got 2 metadata files.
Apr 27 11:07:31 DEBUG fsck: Found /var/hda/files/drives/500blue/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
Apr 27 11:07:31 DEBUG fsck: Found /var/hda/files/drives/1gbDisk/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
Apr 27 11:07:31 DEBUG fsck: Loading metafiles for Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG ...
Apr 27 11:07:31 DEBUG fsck:   Got 2 metadata files.
Apr 27 12:04:54 DEBUG fsck: Found /var/hda/files/drives/500blue/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
Apr 27 12:04:54 DEBUG fsck: Found /var/hda/files/drives/1gbDisk/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
Apr 27 12:04:54 DEBUG fsck: Loading metafiles for Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG ...
Apr 27 12:04:54 DEBUG fsck:   Got 2 metadata files.
Apr 27 12:04:56 DEBUG fsck: Checking MD5 checksums for Pictures/DCIM 02/100CANON/_JPG/IMG_2696.JPG
Apr 27 12:04:56 INFO md5-worker: Working on MD5 task ID 120518: /var/hda/files/drives/1gbDisk/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
Apr 27 12:04:56 DEBUG md5-worker:   MD5 for /var/hda/files/drives/1gbDisk/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG = d41d8cd98f00b204e9800998ecf8427e
Apr 27 12:04:56 INFO md5-worker: Working on MD5 task ID 120520: /var/hda/files/drives/1gbDisk/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2656.JPG
Apr 27 12:04:56 DEBUG fsck:   All copies have the same MD5 checksum: b1a1f6488c667cdabd1e105883b05114
May 21 19:50:12 DEBUG fsck: Found /var/hda/files/drives/500blue/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
May 21 19:50:12 DEBUG fsck: Found /var/hda/files/drives/1gbDisk/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
May 21 19:50:12 DEBUG fsck: Loading metafiles for Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG ...
May 21 19:50:12 DEBUG fsck:   Got 2 metadata files.
May 21 19:50:20 DEBUG fsck:   Got 2 metadata files.
May 21 19:50:20 INFO md5-worker: Working on MD5 task ID 222283: /var/hda/files/drives/1gbDisk/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
May 21 19:50:20 DEBUG md5-worker:   MD5 for /var/hda/files/drives/1gbDisk/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG = d41d8cd98f00b204e9800998ecf8427e
May 21 19:50:20 DEBUG fsck: Found /var/hda/files/drives/500blue/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2885.JPG
May 21 19:50:20 INFO md5-worker: Working on MD5 task ID 222285: /var/hda/files/drives/1gbDisk/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2656.JPG
May 21 22:08:01 DEBUG fsck: Found /var/hda/files/drives/500blue/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
May 21 22:08:01 DEBUG fsck: Found /var/hda/files/drives/1gbDisk/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
May 21 22:08:01 DEBUG fsck: Loading metafiles for Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG ...
May 21 22:08:01 DEBUG fsck:   Got 2 metadata files.
Dec 22 11:19:45 DEBUG rename:   Changing metadata file: /var/hda/files/drives/500blue/gh/Pictures/DCIM 02/100CANON/_jpg/IMG_2661.JPG -> /var/hda/files/drives/500blue/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
Dec 22 11:19:45 DEBUG rename:   Changing metadata file: /var/hda/files/drives/1gbDisk/gh/Pictures/DCIM 02/100CANON/_jpg/IMG_2661.JPG -> /var/hda/files/drives/1gbDisk/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
Dec 22 11:19:45 DEBUG rename:   Updating symlink at /var/hda/files/drives/1gbDisk/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG to point to /var/hda/files/drives/500blue/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
Dec 22 11:19:45 DEBUG rename:   Saving 2 metadata files for Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
Dec 22 12:18:03 DEBUG fsck: Found /var/hda/files/drives/500blue/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
Dec 22 12:18:03 DEBUG fsck: Found /var/hda/files/drives/1gbDisk/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
Dec 22 12:18:03 DEBUG fsck: Loading metafiles for Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG ...
Dec 22 12:18:03 DEBUG fsck:   Got 2 metadata files.
Dec 26 08:53:45 DEBUG fsck: Found /var/hda/files/drives/500blue/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
Dec 26 08:53:45 DEBUG fsck: Found /var/hda/files/drives/1gbDisk/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG
Dec 26 08:53:45 DEBUG fsck: Loading metafiles for Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG ...
Dec 26 08:53:45 DEBUG fsck:   Got 2 metadata files.

From filesystem
===============
Landing Zone:
ls: cannot access '/hda/files/drives/1gbDisk/gh/Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG': No such file or directory

Metadata store:

File copies:

Seems to cut off but that's the whole output (unless I'm missing how to page)

old-square-eyes avatar May 21 '20 21:05 old-square-eyes

The output of --debug doesn't show anything at the end because it expects the specified file to start with a share name. Like this:

greyhole --debug="Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG"

From everything you sent me, I can't see any indication that you would be able to restore those files.

You can double-check by manually looking in:

ls -l /var/hda/files/drives/*/gh/"Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG" /var/hda/files/drives/*/gh/.gh_trash/"Pictures/DCIM 02/100CANON/_JPG/IMG_2661.JPG"

To find all the empty files, you can zgrep d41d8cd98f00b204e9800998ecf8427e /var/log/greyhole.log*; that md5 hash means an empty file.

gboudreau avatar May 22 '20 00:05 gboudreau

I know it's a broad question. But what can cause these issues? And could Greyhole not make an informed decision when it comes to overwriting a good file with a zero kb file - or any other corruption aside from an entire drive failure? (even if that's not exactly what happened here).

I'm not picking holes... I love GH. I just need to know what fails GH covers for (yes I have read the info page and understand it's not a backup e.g. for accidental deletion). But if it can't heal corrupt files with better copies it holds, and instead kills those... then maybe it could be made more powerful and safe if it could protect in that manner too.

old-square-eyes avatar May 22 '20 02:05 old-square-eyes

Greyhole is kind of a filesystem, in that it receives instructions, and its job is to correctly execute those. On a real filesystem, like ext4, the filesystem layer receives instructions to store data in files and folders, move files, delete files, etc. It does its thing (update metadata it needs, check various things, like available space, etc.), then delegates the rest to the storage driver, which in turn will interact with physical disks. Greyhole works the same, but higher. It receives it's instructions from Samba (via a VFS module), does its own thing (based on the instruction it received), and then will ask one or more filesystem to store, move, delete (etc.) a file. The Greyhole LZ is the equivalent of an inode table of a real filesystem, acting as pointers to where the various data is stored.

Basically, that means that Greyhole will execute whatever instructions it receives from the Samba VFS as requested. If that instruction is to delete a file, it deletes all file copies. If that instruction is to write a 0-byte file over an existing file, it will do it as requested. Greyhole doesn't know who, what or why that happened; it only knows Samba told it to do it.

Now, Greyhole has some control over how it executes those instructions. One example is ignored files: if the instruction received by Greyhole matches one of the defined ignored files definition, it will ignore this instruction. Similarly, it could try to identify unwanted situations, and act on those. But there are so many possibilities, and possible false-positives, that I'm not sure what could be done without too much effort, that would be useful to people.

gboudreau avatar May 22 '20 14:05 gboudreau