mik icon indicating copy to clipboard operation
mik copied to clipboard

Ignore/skip Thumbs.db files in CsvBooks toolchain

Open bondjimbond opened this issue 7 years ago • 8 comments

When packages are created in Windows, you get a lot of directories with unwanted files called Thumbs.db. MIK does not like these; you end up with errors like this:

[2018-02-07 21:31:03] input validator.ERROR: Input validation failed {"record ID":"16","book object directory":"/Volumes/Arca/DOH_FILES/arms_cheerio_pt1/ARMS_04_1943","error":"Book object input directory contains unwanted files"} []

In a large package of directories it's not easy to delete these all manually. It would be nice if MIK could skip them.

bondjimbond avatar Feb 08 '18 14:02 bondjimbond

Note that on a Mac at least, it actually is easy enough to remove all these files with a command:

find /path/to/tree -name 'Thumbs.db' -delete

But this requires that you know they're there, that you know they're the problem, and that you know how to do it. Since this is likely going to be a very common problem with any image set created in Windows, it's best if MIK knows how to deal with it.

bondjimbond avatar Feb 08 '18 14:02 bondjimbond

There is a helper script to remove unwanted files for an input directory like Thumbs.db: https://github.com/MarcusBarnes/mik/blob/master/extras/scripts/remove_files.php I haven't had a chance to test on Windows.

MarcusBarnes avatar Feb 08 '18 14:02 MarcusBarnes

Agreed. There's a cookbook entry for dealing with this, and the iipqa can detect them, but MIK should skip them. There are equivalent unwanted files on Macs, so we should include those as well:

.Thumbs.db Thumbs.db .DS_Store DS_Store

We would only need to add this list to the base filegetter class, I think, and then in each filegetter, reference the list. Thumbs.db can appears in any directory that contains image or video (maybe other) files, and the DS_Store can appear anywhere I think, so this applies not just to Books.

mjordan avatar Feb 08 '18 14:02 mjordan

The REST Ingester skips these: https://github.com/mjordan/islandora_rest_ingester/blob/master/includes/Ingester.php#L30

mjordan avatar Feb 08 '18 14:02 mjordan

MIK does already appear to skip .DS_Store automatically; I've always had those in my packages without problems.

bondjimbond avatar Feb 08 '18 14:02 bondjimbond

We should test whether they show up in Windows, e.g., the files are written by OSX but if you run MIK Windows using that input, they might show up. I might be mistaken. Wouldn't hurt to have a list and just get the filegetters (or any other component that needs to) to skip every file in that list.

mjordan avatar Feb 08 '18 14:02 mjordan

Wouldn't hurt to have a list and just get the filegetters (or any other component that needs to) to skip every file in that list.

Agreed. I don't think we need to bother testing what Windows does, in that case; just tell it to skip that list of garbage files every time.

bondjimbond avatar Feb 08 '18 15:02 bondjimbond

We should be able to write PHPUnit tests for this feature pretty easily.

mjordan avatar Feb 08 '18 15:02 mjordan