pneumatic icon indicating copy to clipboard operation
pneumatic copied to clipboard

Add way to add each document's `file_hash` and `pages` value to database

Open anthonydb opened this issue 9 years ago • 2 comments

Upon upload, the DocumentCloud API response does not include the values for file_hash or pages, probably because those get calculated during the processing of the document and are not available when the file is dropped off.

I'd like to add a function in db.py to walk through the database of uploaded files and retrieve those values for each doc. It should include multiprocessing on supported platforms.

anthonydb avatar Feb 18 '16 18:02 anthonydb

Closed via https://github.com/anthonydb/pneumatic/commit/d71a56a098865d4fb0e50bb36c6b78a1cafebf4c

anthonydb avatar Mar 07 '16 03:03 anthonydb

Reopening to remind myself that the update_processed_files method needs to write something to the database indicating a file is not found.

anthonydb avatar Mar 07 '16 04:03 anthonydb