mik
mik copied to clipboard
Ignore/skip Thumbs.db files in CsvBooks toolchain
When packages are created in Windows, you get a lot of directories with unwanted files called Thumbs.db. MIK does not like these; you end up with errors like this:
[2018-02-07 21:31:03] input validator.ERROR: Input validation failed {"record ID":"16","book object directory":"/Volumes/Arca/DOH_FILES/arms_cheerio_pt1/ARMS_04_1943","error":"Book object input directory contains unwanted files"} []
In a large package of directories it's not easy to delete these all manually. It would be nice if MIK could skip them.
Note that on a Mac at least, it actually is easy enough to remove all these files with a command:
find /path/to/tree -name 'Thumbs.db' -delete
But this requires that you know they're there, that you know they're the problem, and that you know how to do it. Since this is likely going to be a very common problem with any image set created in Windows, it's best if MIK knows how to deal with it.
There is a helper script to remove unwanted files for an input directory like Thumbs.db: https://github.com/MarcusBarnes/mik/blob/master/extras/scripts/remove_files.php I haven't had a chance to test on Windows.
Agreed. There's a cookbook entry for dealing with this, and the iipqa can detect them, but MIK should skip them. There are equivalent unwanted files on Macs, so we should include those as well:
.Thumbs.db Thumbs.db .DS_Store DS_Store
We would only need to add this list to the base filegetter class, I think, and then in each filegetter, reference the list. Thumbs.db can appears in any directory that contains image or video (maybe other) files, and the DS_Store can appear anywhere I think, so this applies not just to Books.
The REST Ingester skips these: https://github.com/mjordan/islandora_rest_ingester/blob/master/includes/Ingester.php#L30
MIK does already appear to skip .DS_Store automatically; I've always had those in my packages without problems.
We should test whether they show up in Windows, e.g., the files are written by OSX but if you run MIK Windows using that input, they might show up. I might be mistaken. Wouldn't hurt to have a list and just get the filegetters (or any other component that needs to) to skip every file in that list.
Wouldn't hurt to have a list and just get the filegetters (or any other component that needs to) to skip every file in that list.
Agreed. I don't think we need to bother testing what Windows does, in that case; just tell it to skip that list of garbage files every time.
We should be able to write PHPUnit tests for this feature pretty easily.