fscrawler
fscrawler copied to clipboard
Use file ctime, not mtime for detecting modified or new files
Hi David,
I just ran into the following problem using fscrawler to index files on a Samba fileserver for use with Spotlight and Apple clients:
when Macs copy a file to the Samba server the client will subsequently set the timestamps (atime, mtime and btime (birthtime/creation date)) of the copied file to the same value as the original file.
For older files the mtime of the files will typically be older compared to the last fscrawler invocation, as a result fscrawler will ignore this will in an index run.
Here's an example stat output from today on such a file on a Samba server:
# stat 'some.tif'
File: some.tif
Size: 2416300 Blocks: 4728 IO Block: 4096 regular file
Device: fd00h/64768d Inode: 67268640 Links: 1
Access: (0666/-rw-rw-rw-) Uid: ( 1001/ smbtest) Gid: ( 1001/ smbtest)
Context: system_u:object_r:default_t:s0
Access: 2022-08-03 15:36:45.779778451 +0200
Modify: 2022-03-23 07:51:38.000000000 +0100
Change: 2022-08-03 15:36:44.847786265 +0200
Birth: 2022-08-03 15:36:42.485806064 +0200
As the ctime is dating back 5 months, fscrawler refused to index the file unless forced with --restart.
If fscrawler would look at the ctime value instead of mtime, this would solve this problem as the ctime can't be set by userspace and will always reflect the last date the file inode was created and possibly modified subsequently by any file content or metadata changes.
If not changing the default behaviour, would it be possible to get an fscrawler option to use ctime instead of mtime?
Thanks! -slow
If fscrawler would look at the ctime value instead of mtime, this would solve this problem as the ctime can't be set by userspace and will always reflect the last date the file inode was created and possibly modified subsequently by any file content or metadata changes.
That'd be surely a good thing to do.
If not changing the default behaviour, would it be possible to get an fscrawler option to use ctime instead of mtime?
I know that changing the time field might have side effect so an option would may be better.