docspell icon indicating copy to clipboard operation
docspell copied to clipboard

Support Single Tags to be white/blacklistet for auto tagging

Open mirisbowring opened this issue 3 years ago • 10 comments

Hi,

why are only tag groups supported for learning auto-tagging?

I grouped tags for topics like "health", etc. Unfortunately most groups contain some specific tags that no algorithm can detect. Thats why i would like to specifiy obvious tags like "invoice" which is a tag, not a group.

Regards!

mirisbowring avatar May 13 '22 07:05 mirisbowring

Yes I see. The groups are currently used to provide a set of possible values, where only one is chosen from by auto-tagging. It would be better to have this independent from the groups as an organizational feature - this is left from the beginnings of the project. Currently you would need to create tag groups with the auto-tagging in mind. I think it would be good to have a separated configuration for auto-tagging.

eikek avatar May 13 '22 23:05 eikek

Well, this is not a user expected behaviour. All my Users are using groups e.x. as "Health" and inside there would be doctor reports, messages from health insurance, etc.

Could you please implement the ability to Black/Whitelist specific Tags for autotagging and provide an estimated date? :)

mirisbowring avatar May 14 '22 06:05 mirisbowring

"User expected behavior" is a quite subjective thing. Tag groups are distinct. If you use groups like "Health" , do you have then different tags for e.g. "invoice"? Maybe use a tag group "topic" and tags for "health" and the like. . Currently you need to group tags to aid auto-tagging.

I'll put this on my list, but of course there is no eta :-)

eikek avatar May 14 '22 08:05 eikek

Invoice is a groupless tag.

But health would be "Krankenkasse", "Arbeitsunfähigkeit", etc.

They belong to "health" but not neccessarily to each other.

mirisbowring avatar May 14 '22 09:05 mirisbowring

What must be changed?

Probably i am going to try a PR.

mirisbowring avatar May 14 '22 09:05 mirisbowring

Why not using another group for "document type" where you can put "invoice" in it?

You can try it. You would need to create a ui for users to create set of tags and then use it in the training accordingly. Then these new models can be used in the processing to guess more tags. I hope there is not too much to change - must be backwards compatible ofc.

eikek avatar May 14 '22 10:05 eikek

Why not using another group for "document type" where you can put "invoice" in it? well, the "invoice" was just an example - there are much more tags that i would like to White/Blacklist and in this case i would need to create taggroups with single tags in them.

mirisbowring avatar May 21 '22 20:05 mirisbowring

Yes… this was also just an example :) I mean you can do different things. One group per tag is one. If you have more tags to blacklist than not, you could put all of them in one group - for example. I'm sure you'll find other variants that fit better for the time being. It might also be worth noting, that processing time grows with each group to be detected. If you have tags where docspell should pick just one - I would put them into a group.

eikek avatar May 21 '22 20:05 eikek

Why not using another group for "document type" where you can put "invoice" in it?

I've thought about this and my tag structure. In the end I came to the conclusion, that most tags could be assigned to a "technical" Tag-Group without loosing my ability to easily filter for those documents.

Thanks for the Input.

mirisbowring avatar May 25 '22 12:05 mirisbowring

Ah nice, great you found a working alternative! Of course, for the future it would still be nice to create additional groups for detection only.

eikek avatar May 25 '22 22:05 eikek