organize
organize copied to clipboard
Match dates in PDFs
I’m using hazel on the Mac since several years now but I’m attracted to have a platform independent solution like organize. On thing that I use a lot in hazel is date matching in PDFs.
You can define a pattern for a date and tell hazel to use the n th occurrence of a date with this pattern from the beginning or the end and load day, month and year in variables. I use this to get the date when an invoice was written and rename the file accordingly.
Would this be possible with organize?
Yes this is possible with the filecontent filter: https://organize.readthedocs.io/en/latest/page/filters.html#filecontent
But I have to admit writing the regex is not as nice as easy as defining the hazel pattern.
I came here for exactly the same reason (except I use FileJuggler 2 on Windows). Would you consider integrating https://github.com/akoumjian/datefinder to make date-handling easier?
Thanks
Edit: Actually, nevermind.. datefinder isn't as good as I would have hoped. Using custom regex seems more useful.
Great tool by the way :)
I'm no expert but this is the fairly manual solution approach I took to automatically sort files. I couldn't figure out how to "grab the third YY/MM/DD" but I used the echo: '{filecontent}' approach below to get the unique text around the specific date I wanted (e.g., the third one) and made a filecontent rule based on that. Is there a better approach?
config.yaml:
rules:
# Sort Invoices Using File Names and File Content
- name: "Sort My Invoices"
locations: ~/Downloads/ #adjust as needed
subfolders: false #don't look in subfolders
filters:
- extension: pdf #the invoice is always a PDF, so only act on PDFs
- regex: '.{8}-.{4}-.{4}-.{4}-.{12}' #regex for the consistent file name format when the invoice is downloaded from the web; I downloaded several to check the name format is consistent.
- filecontent: 'Invoice' #whatever text appears in the PDF that differentiates it from other files. this is probably redundant since I am first filtering using a somewhat unique file naming format.
- filecontent: '(?P<month>[01]\d)\/(?P<day>[0123]\d)\/(?P<year>\d{2})' #finds first instance of MM/DD/YY, assigns "month"/"day"/"year" variables for later use in the file name.
actions:
- move:
# move to proper folder; rename to start with '20' because the file only contains "YY" and not "YYYY". I want it the file name to be '2024.01.21 Invoice' but the format of the text in the document itself is "01/21/24" so I made those variables and added the "20" in front of the "YY".
dest: '~/Documents/Invoices/20{filecontent.year}.{filecontent.month}.{filecontent.day} Invoice.pdf'
on_conflict: 'skip' #skip if there is a conflict
# To see the contents of the file to inform the 'filecontent' filters above, use the below rule to get the raw text.
- name: "View My Invoice"
locations: ~/Downloads/test
subfolders: false
filters:
- extension: pdf
- filecontent
actions:
- echo: '{filecontent}'