droid icon indicating copy to clipboard operation
droid copied to clipboard

Request to add option to collect created information

Open Edin99 opened this issue 4 years ago • 6 comments

It would be useful if it was possible to optionally collect additional information about files, especially Created dates. I wouldn't expect this to be added to the default export but an option to export created dates of files with the other information would help in instances where it is necessary to capture the original created date for for files in SharePoint that originated in network drives. SharePoint exports use the SharePoint created date and not the original file created date which impacts the metadata being exported for archives.

Edin99 avatar Aug 30 '19 09:08 Edin99

Thank you Edin99, do you mean the file creation date as displayed by the operating system?

Dclipsham avatar Aug 30 '19 15:08 Dclipsham

Yes the original, e.g. Word creation date, and not the "new" creation date added by, for example, SharePoint following import.

Edin99 avatar Sep 02 '19 07:09 Edin99

DROID is only scanning file system metadata for created/last-modified etc. ApacheTika would be a better option for what you seem to be after as it looks into embedded file metadata for a wide range of file formats.

DavidUnderdown avatar Sep 02 '19 10:09 DavidUnderdown

This isn't quite what the request asks for, but I should note that DROID has only ever recorded the last modified datetime, not the file creation date time.

This was because in the days of Java 6, it wasn't possible to get anything other than the last modified date time. Using NIO libraries in Java 7 and later, it would be possible to obtain further file system metadata, like file creation time (not necessarily the same as the embedded Word time).

Of course, this is a reasonably large change, as it requires changes to the data model (database tables, export results, filters, UI ... anything that might touch the new data).

nishihatapalmer avatar Sep 03 '19 10:09 nishihatapalmer

Thanks for the responses. I need to capture original file creation date from files and thought that would be a useful DRIOD feature for SharePoint users in order to avoid the problem of exporting files from SharePoint the upload to SharePoint date treated as created date.

If anyone could suggest an alternative method of extracting this metadata that would be great.

Edin99 avatar Sep 03 '19 10:09 Edin99

I'd echo the use of Tika if you're looking for a range of metadata options from a wide-range of formats.

If you have Linux tools available, the subsystem in recent Windows, or perhaps there are Power Shell alternatives, commands like stat will work well.

$ stat foo.ini 
  File: tox.ini
  Size: 1323      	Blocks: 8          IO Block: 4096   regular file
Device: 812h/2066d	Inode: 393476      Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/ross-spencer)   Gid: ( 1000/ross-spencer)
Access: 2019-08-28 09:12:44.772035199 +0200
Modify: 2019-08-28 09:12:44.772035199 +0200
Change: 2019-08-28 09:12:44.772035199 +0200

FITS will also provide some of this information, and that wraps a DROID, and a JHOVE, and a few other bits and pieces, but large-scale performance is difficult to find.

This probably isn't the forum, but as you are using Sharepoint (are you using any records management extensions?) then capturing this information can be done in a multitude of other ways. It is a tough one for policy to require users to get this information. I'd personally be looking to combine whatever Sharepoint metadata (the record metadata) you have with the file metadata, but not solve it (getting creation date) at the file level alone. The file system introduces its own difficulties which is perhaps why JAVA initially approached this from the point of not capturing the data. See this table of filesystem metadata comparisons to see where creation time simply isn't captured.

ross-spencer avatar Sep 03 '19 11:09 ross-spencer