docspell
docspell copied to clipboard
Automatic document name
Hi, now another idea, of which I do not know if it is feasible, since I have not seen anything like this from any comparable software: I scan my documents without names (Just a placeholder, something like scan12345.pdf). Before I confirm a document I select the subject line (or similar), copy it and use this as name. Wouldn't it be imaginable that the subject/name/title is recognized automatically? Are there any NLP approaches to generate a "document name" automatically? Maybe several as suggestions? Cheers Eresturo
That's an interesting idea! I have to do research here, but I believe it is something where NLP techniques can help and maybe also with the document structure. It depends a bit on what is expected. NLP could maybe identify short phrases that consist of the mostly used noun + verb or something (I'm not an NLP expert). But the document structure is often important ; so it is probably difficult to find something that works "for all" documents. But maybe something close or "good enough" is possible. It would be indeed of great help.
Hello, I think the idea is good too. In my case the scanned papers have file names according to the pattern scan_date_time. It would be nice to be able to rename them according to a pattern. Something like this: [document type][abbreviation organization][item date].pdf. For example: Invoice_Amazon_2021-07-01.pdf I can also imagine using the recipient or tags of a certain category for this. It might be an idea to be able to capture short forms for organizations and persons. PS: Thanks for this great program.
Hi @nero82 , thank you!
I'm not sure if this is a different case: do you mean file names or the item name? I think the proposal with nlp is regarding the item name. But I think a pattern for filenames is a good idea, too. I'm myself not concerned too much about file names anymore, but when you send it via email or store it on disk, it matters again. I think the pattern is a good idea, we have only to make sure that multiple files per item are distinguishable.
Yes, my mistake. I thought the item name was hard-coupled to the file name. But in the most part of my concern was the displayed name, which is the item name. It would be nice if there would be something more meaningful like scan_2021-01-07. The side effect of having a bit "nicer" named files when exporting is also great, of course. So a function to create patterns for the item name and an option to use them for the export?
The item name only defaults to the file name (or some text like "3 files" if there are multiple files on an item) when processing. I also think that most value would be to make a better guess on the item name based on the content.
Do you think it's worth it to make the name a pattern of only other things already visible? I would think that it's easier to simply hide the filename in this case, or use a pattern for the title on the card (it's not changing the item name field, but the card in the list view can display something different as title), like this?
I must confess, I had overlooked the option to define the title by pattern. In principle, this is what I had in mind. Cool would be the possibility to use user defined fields and tags of a certain category. It would be an advantage to be able to enter short forms for persons and organizations. Some have quite long and bulky names. Example: short=ADAC long=Allgemeiner Deutscher Automobil-Club e.V.
Displaying only visible data sounds wrong at first, but in my opinion it can help to highlight the personally important attributes.
Suggestions for the name based on the content actually sounds even better. A series of suggestions, from which you click together the name. Over time, the system could learn what the user chooses.
Cool would be the possibility to use user defined fields and tags of a certain category.
The patterns can be extended, custom fields are a good idea – I create a separate issue for this.
It would be an advantage to be able to enter short forms for persons and organizations. Some have quite long and bulky names. Example: short=ADAC long=Allgemeiner Deutscher Automobil-Club e.V.
You mean to add an optional field to person/organization for a short-name, right? Also a good idea :-) I'll create an issue for this.
Displaying only visible data sounds wrong at first, but in my opinion it can help to highlight the personally important attributes.
Yes, absolutely agreed – that is the idea for the patterns. You should be able to say what is important to you and display it on the card. There is more possible, of course, currently you can only define the title and subtitle. But this is different from having a field value (stored at the db) contain only values of other fields. It seems to me not so ideal as of now. If you curate the item name, you want to display it. If you rather like something concatenated from other parts, you could use a pattern and simply discard the name.