This is a draft of a new papis tag command that allows adding/removing/clearing tags. There are some docs inside to show you what's possible.

This is mostly just a first working draft to get some feedback. Here are some questions/considerations I'm unsure about:

Is this a useful thing to ship with papis? We already have papis update that can do some of what papis tag can do. Is there too much overlap? Should this maybe rather be an additional script?
What functionality would you like to see? Maybe it could be useful to apply all tags from one doc to another for instance...
Better idea for the names/structure of the options? For instance I would have liked to use --add for adding tags, but the short version of that is -a which is already used by --all. I've also considered using +TAG and -TAG, but haven't found a straightforward way of implementing this in python click (and it isn't consistent with other papis commands).
I'm not all that well versed with python, so any suggestions for improvements are very welcome.
I've added a new config option for the tag-format. I've made some assumptions regarding how people list their tags, namely that they're either a list, or a string separated by , , ; or . I've considered allowing any arbitrary separators, but that's not very easy to set in the config because of the white space. I've also considered trying to detect the tag-format automatically (which I actually do in papis.nvim), but opted against it as it isn't foolproof (which is more acceptable for papis.nvim as it doesn't edit any documents).

Note that there might be bugs in here, so don't use with your actual library. Also, it requires explicitly setting the tag-format in your config.

EDIT: Bunch of errors in the CI, looks like my winging of the typing system hasn't quite gone right (and I also introduced a bunch of trailing white space when I wrote the explanations). :sweat_smile: I'll get to these!

Aug 29 '23 20:08 jghauser

Thanks a lot for the feedback, it's super useful!

It would also be helpful to list all the existing tags and maybe do some deduplication of some sort. For example, I have some tags with fluid-mechanics and fluid-dynamics (depending on how I felt at the time) and would love to merge them!

Yes, that's something I've been wanting as well. Do you think it might make sense to maybe implement it in papis doctor (but use the functionality we'll get from papis tag)?

Maybe having some subcommands like papis bibtex? papis tag QUERY add TAG1 TAG1 remove TAG3 TAG4

Yes, this seems like a better way to do it. Thanks!

I think a more flexible option would be to have a tag-format-separator and use that if the tags field is a string. Maybe even just standardize on the tags being a list in the info.yaml file already? Then the separator would be just a fallback thing..

I think it would make sense to standardise things to lists, but then I'm using lists and I'm possibly biased :sweat_smile:. In any case, that would probably be something to do at a later point and with some conversion path. I've added some comments in the responses regarding why I've required explicitly setting the tag-format option. If there's a way to assume lists (except if tag-format-separator is set) without potentially leading to inconsistencies, I'm happy to do it. Let me know what you think!

Aug 30 '23 21:08 jghauser

Yes, that's something I've been wanting as well. Do you think it might make sense to maybe implement it in papis doctor (but use the functionality we'll get from papis tag)?

Yeah, that's a good idea! The duplicate-refs check is already very similar, so it should fit just fine.

I think it would make sense to standardise things to lists, but then I'm using lists and I'm possibly biased 😅. In any case, that would probably be something to do at a later point and with some conversion path. I've added some comments in the responses regarding why I've required explicitly setting the tag-format option. If there's a way to assume lists (except if tag-format-separator is set) without potentially leading to inconsistencies, I'm happy to do it. Let me know what you think!

Yeah, I think it also makes sense to standardize on lists. Then we wouldn't have to worry about the tag having whitespace in it, or having the separator as part of the tag for some reason, or any other issue like that.

Maybe we can have support for reading the tags in a variety of formats, but when we write them back it's always a list? That way users will slowly migrate to that format in all their documents.

Aug 31 '23 18:08 alexfikl

It would also be helpful to list all the existing tags and maybe do some deduplication of some sort. For example, I have some tags with fluid-mechanics and fluid-dynamics (depending on how I felt at the time) and would love to merge them!

Yes, that's something I've been wanting as well. Do you think it might make sense to maybe implement it in papis doctor (but use the functionality we'll get from papis tag)?

Actually this should just work with

papis tag --add fluid-dynamics --remove fluid-mechanics QUERY

so maybe not a great suggestion :)) Guess this command is awesome already! :heart:

Aug 31 '23 18:08 alexfikl

Actually this should just work with
papis tag --add fluid-dynamics --remove fluid-mechanics QUERY
so maybe not a great suggestion :)) Guess this command is awesome already! ❤️

Actually, I was thinking to somehow search the library for suspiciously similar tags and then propose to merge them. Not sure that would work with your example, but I've got lots that just differ in capitalisation etc.

Aug 31 '23 19:08 jghauser

Yeah, I think it also makes sense to standardize on lists. Then we wouldn't have to worry about the tag having whitespace in it, or having the separator as part of the tag for some reason, or any other issue like that.

Maybe we can have support for reading the tags in a variety of formats, but when we write them back it's always a list? That way users will slowly migrate to that format in all their documents.

Or we could also tell people that papis tag only supports lists and give them the option of converting everything to lists. That way papis tag can be simple and we don't even need a tag-format option. We could have a papis tag --convert-from "SEPARATOR" (or so) command that converts everything...

@alejandrogallo, what do you think about standardising tags to lists? I'm asking because my suspicion is that you're using strings (as existing docs use them). Are there any use cases we're not considering for which lists aren't well suited?

Aug 31 '23 19:08 jghauser

Great new feature than this add/remove tag options. I support it.

@jghauser, what do you mean by "standardising tags to lists" ? Writing a list format "[tag1", "tag2 with space", ...] in the tag field of the yaml file (or formated like the author_list or citations fields...) ? I putting here a python list format but I don't know what it is the best way to store a list in a yaml file... I support the idea as soon as it can be easily parsed directly from the yaml file (with a tool like python json/yq/jq...).

BUT, what about the databases? I use the whoosh database personally, which let you perform complex searches like year:2020 AND tags:neolithique australia (this syntax here search for docs with tag 'neolithique' AND 'australia'), and I am wondering about the easiness of dealing with list instead of strings in the database. I think than if this change is implemented, it is important to modify the databases accordingly if needed to be sure it doesn't disable any feature the databases previously provided.

Sep 02 '23 12:09 Twix53791

Thanks for the input @Twix53791!

@jghauser, what do you mean by "standardising tags to lists" ? Writing a list format "[tag1", "tag2 with space", ...] in the tag field of the yaml file (or formated like the author_list or citations fields...) ? I putting here a python list format but I don't know what it is the best way to store a list in a yaml file... I support the idea as soon as it can be easily parsed directly from the yaml file (with a tool like python json/yq/jq...).

it would be like this:

tags:
- shared intentionality
- cognitive science
- developmental psychology

BUT, what about the databases? I use the whoosh database personally, which let you perform complex searches like year:2020 AND (tags:neolithique OR tags:australia), and I am wondering about the easiness of dealing with list instead of strings in the database. I think than if this change is implemented, it is important to modify the databases accordingly if needed to be sure it doesn't disable any feature the databases previously provided.

I'm not entirely sure how this is handled (maybe @alexfikl can chime in), but it seems to work as expected. I'm also using the whooh database, and when i search for e.g. tags:"shared intentionality" it seems to give me all documents containing that tag.

Sep 02 '23 12:09 jghauser

I'm not entirely sure how this is handled (maybe @alexfikl can chime in), but it seems to work as expected. I'm also using the whooh database, and when i search for e.g. tags:"shared intentionality" it seems to give me all documents containing that tag.

I think it's just stringifying the list, so it does a substring search in the string '["shared intentionality", "cognitive science", "developmental psychology"]' and it finds stuff as expected.

Sep 02 '23 17:09 alexfikl

Well the format you propose is fine for me (I am telling that because I use yq to parse the papis yaml file in another program I am writing, a "fzf browser" for papis papis-fzf).

But I just tried a bit the whoosh search and I think the database, yes, stringifying the list like that (I checked directly the whoosh database) : ['cognitive science', 'intentionality']. But, about the search, it does not consider the list elements as one unique entity.

I tried with a document:

tags:
- cognitive science
- intentionality
- developmental psychology

All these searches match it: papis list "tags:cognitive AND tags:intentionality" papis list "tags:science AND tags:intentionality" `papis list "tags:science AND tags:psychology"

It is not because it matches a part of the string, papis list "tags:scie doesn't match.

Downsides of tags as a string

In order to have spaces in the tag, it is not really possible at the moment. This might be very upsetting for some people.
Inherently it is then not clear with separator to use, I have an unconscious and innate hatred of commas and I guess that is why papis is more geared to spaces.

Downsides of tags as a list

When editing manually documents, users must know that the format of the tags are such of a list. But the upside is that with your command we can have a more lean and short way of dealing with tags. X Apparently whoosh and most databases can just store the stringified python list in the database, so this is unlikely to bite us in the future. X Tags are non-greppable. (we can discuss if this is a downside) I think it is not valid since yaml is not supposed to be grepped for.

I would say that I support just having a way of storing tags, namely as a list. It will make the implementation and documentation clearer. The implementation is not really super complicated, in the sense that we could affort having this logic here, but I think if we are going to officially support tags like citations or notes, then we might as well choose a single format.

I propose writing a function that checks for the formatting of the tags and offers to convert tags to their list format.

I kind of like the papis QUERY add TAG1 remove TAG2 synax...

This kind of way by now is kind of typical to papis, for better or for worse. papis explore also has this kind of behaviour. This kind of behaviour is handy whenever you need a kind of pipe. In the case of explore, you want to search for documents online and you might pick one, or export them or add one to the library.

In the case of tags, I don't see this rationale, therefore I would be rather in favour of --foo flags.

Other commands

I would like to have a papis tag -al or papis tag --all --list command. I think it should be straightforward.

In the future we can also implement a cache for tags.

External API

I would suggest creating a file à la citations.py in papis/tag.py in order to have there two functions to do the tag managment. This can help me then to use the functions from the web application. Soemthing lke

papis/tag.py

def get(doc: papis.document.Document):
    tags = papis.get("tags", [])
    if not isinstance(tags, list):
        raise deprecation warning and parse according to space separation
        return re.split(r"\s+", tags)
    return tags

def add(doc: papis.document.Document, tag: str) -> None:
    pass

def remove(doc: papis.document.Document, tag: str) -> None:
    pass

def remove(doc: papis.document.Document, tag: str) -> None:
    pass

Sep 10 '23 18:09 alejandrogallo

Thanks a lot for the feedback @alejandrogallo, this is very useful!

I propose writing a function that checks for the formatting of the tags and offers to convert tags to their list format.

The idea is to tell users (with that message that is shown on first use) that they can convert their tags to the required format with something like papis --set doctor-key-type-separator ', ' doctor --check key-type --fix --all.

This kind of way by now is kind of typical to papis, for better or for worse. papis explore also has this kind of behaviour. This kind of behaviour is handy whenever you need a kind of pipe. In the case of explore, you want to search for documents online and you might pick one, or export them or add one to the library.

In the case of tags, I don't see this rationale, therefore I would be rather in favour of --foo flags.

Thanks for the explanation, that makes a lot of sense.

Other commands

I would like to have a papis tag -al or papis tag --all --list command. I think it should be straightforward.

In the future we can also implement a cache for tags.

I think this is already implemented in the draft. There's an --all flag that will apply the tagging to all matching documents. Is that what you had in mind?

I would suggest creating a file à la citations.py in papis/tag.py

Ok!

Sep 11 '23 06:09 jghauser

Ok, I have had an entirely different idea :sweat_smile:. Instead of adding a papis tag command, we could also improve the papis update command so it can handle lists (i.e. adding and removing items to/from a list). Then changing tags would just involve that command. It would also mean we could use papis update for the proposed related_to field, even if it is a list. As far as I can tell, all list items in the info.yaml only include unique items, so making a command that that can handle everything doesn't seem impossible.

Sep 11 '23 07:09 jghauser

Ok, I have had an entirely different idea 😅. Instead of adding a papis tag command, we could also improve the papis update command so it can handle lists (i.e. adding and removing items to/from a list). Then changing tags would just involve that command. It would also mean we could use papis update for the proposed related_to field, even if it is a list. As far as I can tell, all list items in the info.yaml only include unique items, so making a command that that can handle everything doesn't seem impossible.

I like that! papis update should be able to handle most of this with a few more flags (--append at least) and could nicely generalize to other keys. If it turns out that we need some special handling for tags, we can always make a separate papis tag command that reuses some of the logic from papis update.

Sep 11 '23 16:09 alexfikl

hmmm I don't know, so right now this logic can hold. We currently support kind of 3 things, papis_id, notes and citations. For only one of these there is an ad-hoc command, papis citations.

We can update the papis update command nevertheless, IMHO this would be

papis update --append <thing> <key>
papis update --list-append/--push <thing> <key>
papis update --remove <key>
papis update --list-remove <thing> <key>

Only, how do we do for listing all tags? The above approach fails to deliver this. I think from the point of view of user friendliness, it is much clearer that we explicitly support tags if we have a tag command.

I am therefore for both, tags command and update of update ;)

Sep 11 '23 21:09 alejandrogallo

I think personnaly also than a tag command would be clearer.

Sep 14 '23 07:09 Twix53791

Thanks for all of the input. I'm gonna improve the papis update command (along the lines of what @alejandrogallo suggested) and then implement papis tag mostly just re-using the logic of papis update :)

Sep 14 '23 08:09 jghauser

@alexfikl, with all the work done in papis update, adding a papis tag is now quite easy. I hacked this together quickly, but it seems like everything just works as expected :laughing:. I've had one issue with typing, if you could take a look at the NOTEs in tag.py, that would be very helpful. Also, any other comments you have are of course also appreciated!

EDIT: Just FYI because it's been a while. I've implemented the list-only version of the command as we discussed previously. There's a warning to this effect, and a config option with which one can make the warning go away :).

EDIT2: One more question: I'm thinking to remove the --batch flag, because all the ways in which papis update could fail (and which --batch is meant to ignore) can't happen with papis tag (I think). Now, do you think I should still check the success values returned by the various functions from papis update used in papis tag, to catch errors just in case of something unforeseen (and maybe have a message à la "If you encounter this error, please open a bug report"? Or should I just ignore all of these return values?

Feb 17 '24 20:02 jghauser

@alexfikl thanks a lot for all the comments -- they're very helpful! Before i implement these changes and add some tests, what's your view on this (originally an edit to an above message):

EDIT2: One more question: I'm thinking to remove the --batch flag, because all the ways in which papis update could fail (and which --batch is meant to ignore) can't happen with papis tag (I think). Now, do you think I should still check the success values returned by the various functions from papis update used in papis tag, to catch errors just in case of something unforeseen (and maybe have a message à la "If you encounter this error, please open a bug report"? Or should I just ignore all of these return values?

Feb 19 '24 20:02 jghauser

@alexfikl thanks a lot for all the comments -- they're very helpful! Before i implement these changes and add some tests, what's your view on this (originally an edit to an above message):

EDIT2: One more question: I'm thinking to remove the --batch flag, because all the ways in which papis update could fail (and which --batch is meant to ignore) can't happen with papis tag (I think).

Hm, it seems like it serves to purpose here, so I'm fine with removing it.

Now, do you think I should still check the success values returned by the various functions from papis update used in papis tag, to catch errors just in case of something unforeseen (and maybe have a message à la "If you encounter this error, please open a bug report"? Or should I just ignore all of these return values?

If it's not too confusing, it might be worth to check the success status and complain if something unexpected happens. The main issue with these things is, like you said, avoiding to mess with the user's library..

Feb 19 '24 20:02 alexfikl

Ok perfect, that's how I see it too. Thanks for the feedback :blush:

Feb 20 '24 09:02 jghauser

I've removed the --batch flag and now papis tag just checks if we find non-lists and aborts if it does. I've added some tests too (copied and adapted from papis update). So, this should be getting ready :).

Two questions though:

In the tests, we always pass TemporaryLibrary into the functions, even if it's not actually used (at least that's what I saw in some random other files I looked at). I realised that both in the tests for papis update and papis tag, not a single test uses this. Should I still leave it in?
Messages given when running papis tag will sometimes come from papis update, which I think is a bit confusing. This reminded me of something @alejandrogallo said when we discussed this command a while ago, namely that he'd like the core functionality of papis tag to be accessible with an API in an tag.py or so in the base papis directory. His reasoning was to then use this from the webapp, but I guess it could also help with the messages. My question is if there's a policy/custom on how e.g. commands may refer to each other: is it ok? should it always be via an 'API' module? Should I create a update.py and a tag.py? And should we merge this first and then fix both of them (given that we've already merged the changes to papis update?

Feb 22 '24 21:02 jghauser

In the tests, we always pass TemporaryLibrary into the functions, even if it's not actually used (at least that's what I saw in some random other files I looked at). I realised that both in the tests for papis update and papis tag, not a single test uses this. Should I still leave it in?

Yes! Those are pytest fixtures that set up a temporary configuration and library somewhere in /tmp/ for use in the test. This actually gets used even if you don't use the tmp_library variable because it points to a different config file and a small test library. For example,

Any time the test calls a function that calls papis.config.get, it looks in /tmp/some-folder/papis/config instead of ~/.config/papis/config where the global user config is.
Any query for documents will return documents in the small test library instead of some user library.

This is only relevant when running the tests locally (not the CI). Plus, I don't think this is relevant for you, since you run the tests in a docker, but I just use a virtual environment and it would pick up my global config.

Messages given when running papis tag will sometimes come from papis update, which I think is a bit confusing. This reminded me of something @alejandrogallo said when we discussed this command a while ago, namely that he'd like the core functionality of papis tag to be accessible with an API in an tag.py or so in the base papis directory. His reasoning was to then use this from the webapp, but I guess it could also help with the messages. My question is if there's a policy/custom on how e.g. commands may refer to each other: is it ok? should it always be via an 'API' module? Should I create a update.py and a tag.py? And should we merge this first and then fix both of them (given that we've already merged the changes to papis update?

I don't think there's any particular policy (?) for that sort of stuff. There are some commands that call papis edit here and there, so it should be fine.

Not sure what @alejandrogallo had in mind, but it would be nice to have a top level tags.py file to help with handling tags. We can always refactor the command and move some things around though, so it should be fine to merge this now and do that when we have some time!

One issue that comes to mind with an API for this sort of stuff is that most of the difficulty here comes from the fact that we get a string from the command line and need to know what to do with it when adding it into the document. However, in code this should "just work" because we can pass in a list or an int and not do the try_parsing_str and isinstance(value, key_types[key]) stuff..

Feb 23 '24 05:02 alexfikl

Messages given when running papis tag will sometimes come from papis update, which I think is a bit confusing.

This is mostly just a small design issue. If instead of returning success flags, the functions raised an exception, the caller could handle the message and we could write out something specific for update vs tag.

I guess for now, it should be sufficient to just remove very explicit call "papis update ..." sort of messages from those functions? Most of them were ok from what I recall..

Feb 23 '24 06:02 alexfikl

Yes! Those are pytest fixtures that set up a temporary configuration and library somewhere in /tmp/ for use in the test. This actually gets used even if you don't use the tmp_library variable because it points to a different config file and a small test library. For example,

Oh, I see! Thanks for the explanation :)

Not sure what @alejandrogallo had in mind, but it would be nice to have a top level tags.py file to help with handling tags. We can always refactor the command and move some things around though, so it should be fine to merge this now and do that when we have some time!

One issue that comes to mind with an API for this sort of stuff is that most of the difficulty here comes from the fact that we get a string from the command line and need to know what to do with it when adding it into the document. However, in code this should "just work" because we can pass in a list or an int and not do the try_parsing_str and isinstance(value, key_types[key]) stuff..

Ok, I'll make a note of that and sort this out later.

This is mostly just a small design issue. If instead of returning success flags, the functions raised an exception, the caller could handle the message and we could write out something specific for update vs tag.

I guess for now, it should be sufficient to just remove very explicit call "papis update ..." sort of messages from those functions? Most of them were ok from what I recall..

I just went through all the messages and only some of those that papis tag can't reach (because they have to do with type mismatches) contain references to papis update, so we should be fine. I hadn't thought of using exceptions -- that makes a lot of sense! I'll make a note of that too, and improve this later (it would make sense to do all that in one fell swoop when factoring out things to toplevel tags.py and update.py files).

Feb 23 '24 08:02 jghauser

@alexfikl, do you think this is ready to merge? :)

Feb 25 '24 15:02 jghauser

I've implemented all the changes and force pushed them to my fork but somehow they don't show up here :thinking:

EDIT: ok, now it shows up!

Feb 26 '24 18:02 jghauser

Thanks for all the help @alexfikl! :heart:

Feb 26 '24 21:02 jghauser

Add `papis tag` command

Downsides of tags as a string

Downsides of tags as a list

Other commands

External API

Other commands