pneumatic
pneumatic copied to clipboard
Add way to add each document's `file_hash` and `pages` value to database
Upon upload, the DocumentCloud API response does not include the values for file_hash
or pages
, probably because those get calculated during the processing of the document and are not available when the file is dropped off.
I'd like to add a function in db.py
to walk through the database of uploaded files and retrieve those values for each doc. It should include multiprocessing on supported platforms.
Closed via https://github.com/anthonydb/pneumatic/commit/d71a56a098865d4fb0e50bb36c6b78a1cafebf4c
Reopening to remind myself that the update_processed_files
method needs to write something to the database indicating a file is not found.