sample workflow?

Open rhaynes74 opened this issue 3 years ago • 8 comments

Hi folks, I really appreciate the thorough description of the commands/flags. Are there any published possible workflows?

I seem to be having trouble querying the documents that I have added? Either in the list command or in the addto command (to select the document I want to add to).

It would be great to have a sample workflow which shows adding a few documents to a library, and then searching/opening for those documents by tags or authors or partial titles etc.

Best, RH

Dec 08 '20 11:12 rhaynes74

I don't know if this is exactly what you're looking for, but I outlined my process for using papis in this gist.

Regarding having trouble with querying, it could be a number of things. Are you saying that a document is verifiably in the library but is not found by the query? Every so often, for very recently added documents, this happens to me and I run papis --clear-cache to force papis to rebuild the database. It could also be the match-format in your configuration. For example, I think if you're using papis' builtin database that you can only query things based on what's in the match-format. So if you want to query by a particular year, you need to make sure {doc[year]} is in your match-format. I could be wrong on that. However, if you're using the whoosh database, you can query by fields as in papis open project:my-new-paper (i.e., return only documents whose project field contains my-new-paper).

As for using the addto command, I almost always just query by first author's last name, e.g., papis addto -f FILE.pdf smith. Then I narrow down to the document I want, interactively, within the picker. I'm using the default picker.

Dec 08 '20 14:12 avonmoll

Many thanks, I need to study your config and workflow and see if it solves my problem. I think the problem is actually how I add the documents to my libraries.

Are you doing this on a mac? What packages are needed to get whoosh to work?

Sincerely,

Dr. Ronald D. Haynes

Professor, Department of Mathematics and Statistics Chair, MSc and PhD Scientific Computing Programs Memorial University of Newfoundland

We acknowledge that the lands on which Memorial University’s campuses are situated are in the traditional territories of diverse Indigenous groups, and we acknowledge with respect the diverse histories and cultures of the Beothuk, Mi’kmaq, Innu, and Inuit of this province. On Dec 8, 2020, 11:06 AM -0330, Alexander Von Moll [email protected], wrote:

I don't know if this is exactly what you're looking for, but I outlined my process for using papis in this gist. Regarding having trouble with querying, it could be a number of things. Are you saying that a document is verifiably in the library but is not found by the query? Every so often, for very recently added documents, this happens to me and I run papis --clear-cache to force papis to rebuild the database. It could also be the match-format in your configuration. For example, I think if you're using papis' builtin database that you can only query things based on what's in the match-format. So if you want to query by a particular year, you need to make sure {doc[year]} is in your match-format. I could be wrong on that. However, if you're using the whoosh database, you can query by fields as in papis open project:my-new-paper (i.e., return only documents whose project field contains my-new-paper). As for using the addto command, I almost always just query by first author's last name, e.g., papis addto -f FILE.pdf smith. Then I narrow down to the document I want, interactively, within the picker. I'm using the default picker. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Dec 08 '20 17:12 rhaynes74

@avonmoll thanks that looks awesome, can I add the gist to the papis README or documentation? @rhaynes74 have you seen generally the papis documentation here ? https://papis.readthedocs.io/en/latest/quick_start.html I'd be interested in hearing what are your thoughts about what we can improve in the documentation to make this transition to new users smoother. I recently also wrote a blog post about getting references from papers using papis https://alejandrogallo.github.io/blog/get-paper-references.html , maybe this is also helpful.

Dec 08 '20 17:12 alejandrogallo

Hi folks, thanks for the responses. I have read over that documentation. To me the documentation doesn’t make it clear what information needs to be provided when I add a document so that I can then search for it later, and how to form the query for that search.

Sincerely,

Dr. Ronald D. Haynes

Professor, Department of Mathematics and Statistics Chair, MSc and PhD Scientific Computing Programs Memorial University of Newfoundland

We acknowledge that the lands on which Memorial University’s campuses are situated are in the traditional territories of diverse Indigenous groups, and we acknowledge with respect the diverse histories and cultures of the Beothuk, Mi’kmaq, Innu, and Inuit of this province. On Dec 8, 2020, 2:14 PM -0330, Alejandro Gallo [email protected], wrote:

@avonmoll thanks that looks awesome, can I add the gist to the papis README or documentation? @rhaynes74 have you seen generally the papis documentation here ? https://papis.readthedocs.io/en/latest/quick_start.html I'd be interested in hearing what are your thoughts about what we can improve in the documentation to make this transition to new users smoother. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Dec 08 '20 19:12 rhaynes74

@alejandrogallo - fine by me! I admit that my workflow is not necessarily generic and does not make use of all of papis' features.

@rhaynes74 - how are you adding documents to your library? Regarding forming a query, this part of the documentation might give some answer.

Edit: Forgot to mention: I'm using the same workflow (and git synchronized paper repository) on mac and linux (Manjaro)

Dec 08 '20 19:12 avonmoll

Hi folks, right now just using

papis add filename.ext —set author ‘Name’ —set title ‘Some title'

Sincerely,

Dr. Ronald D. Haynes

Professor, Department of Mathematics and Statistics Chair, MSc and PhD Scientific Computing Programs Memorial University of Newfoundland

We acknowledge that the lands on which Memorial University’s campuses are situated are in the traditional territories of diverse Indigenous groups, and we acknowledge with respect the diverse histories and cultures of the Beothuk, Mi’kmaq, Innu, and Inuit of this province. On Dec 8, 2020, 4:28 PM -0330, Alexander Von Moll [email protected], wrote:

@alejandrogallo - fine by me! I admit that my workflow is not necessarily generic and does not make use of all of papis' features. @rhaynes74 - how are you adding documents to your library? Regarding forming a query, this part of the documentation might give some answer. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Dec 08 '20 20:12 rhaynes74

Maybe we lack a section in the documentation where we make explicit and clear that papis can download documents from different sources (in the papis parlance, these are importers).

Smart mode

So for instance when you do something like

papis add https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.124.171801/

papis does not know where the information is, so it activates the "smart" mode. This is, it looks at the string being added, in this case

https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.124.171801/

and it checks if it understands it somehow.

Papis first checks if this string is an existing file in your file system, which in this case it is not because we don't have a file called like this in our system.
Then papis says, ok, maybe it's a doi, it tries to validate this doi and it fails, so it is not a doi
Or maybe it is an arxiv id, which is not.
Or maybe it is a url, and it says, well yes, it is one. At this moment, it will check if papis knows this journal or this kind of url. We have implemented several journal parsers, for instance there is one for aps.org in papis/downloaders/aps.py.
- Papis recognises that it knows this kind of url as a aps-url, and says, ok, I know how to retrieve information from here because someone implemented it, I'll try to get the information and download a pdf.
- There is a universal url parser which is called fallback and is in papis/downloaders/fallback.py. This means, even if you're checking out an obscure journal or some random url, the fallback downloader will try to get as much information as it cans from the metadata of the website. This works amazingly well thanks to facebook, twitter and these guys. Yes, you've heard well. Since these companies are so important, many web developers want to make sure that the metadata of their webpages are understandable for the big tech companies. This ensures a good Search Engine Optimization and therefore the visibility of the journal's content (or general website's content)

Explicit mode

as @avonmoll wrote in his workflow, you can also tell papis exactly what the thing you're adding is, this you do with the --from flag. For instance, if you're adding the upper paper using a doi and you're sure this is a doi and don't want papis trying funky combinations to figure out what it is, then you'd do

papis add --from doi 10.1103/PhysRevLett.124.171801/

or if you're telling it it's a url, then

papis add --from url https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.124.171801/

TODO

I think you're right, and maybe this piece of information should be made more explicit in the documentation

Dec 08 '20 20:12 alejandrogallo

Combined mode

Something I did not mention and that I use all the time is the combined mode, i.e., using your example, I download the pdf (using some kind of resource at your disposal) and I have the doi or url, then I'd do

papis add paper.pdf https://onlinelibrary.wiley.com/doi/abs/10.1002/andp.19163540702

Now, papis will say,

paper.pdf is an actual file in my file system, so you mean that I should be adding it as a pdf, right? I'll do right that.
https://onlinelibrary.wiley.com/doi/abs/10.1002/andp.19163540702 is not an existing file, so I guess you want me to run my magic and try to figure out data for the paper.pdf document using the smart mode.

Just adding a pdf

Notice that there is also an automatic doi and arxiv parser. This means, imagine you just downloaded a paper let's say https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.123.156401, (I'm using this one since I know you can download it)

Then if you download it and save it as paper.pdf and run

papis add paper.pdf

papis will use the smart mode to try to get the first doi appearing in the text and supposing that this doi belongs to the paper. It will also try to get an arxivid, sometimes this happens, but as you'll see you can just select whatever suits your needs.

I just attached a gif of how this would look like. In this case, I select the information from pdf2doi importer, this means, the importer that takes a pdf and tries to get a doi from it.

At the end of the gif there is a clash since I already have this paper in my library and I do not wish to add it twice.

It is maybe a good idea to set the edit, open and confirm options of the papis-add section to True https://papis.readthedocs.io/en/latest/configuration.html#papis-add-options

Example document: output.pdf

screencast

Dec 08 '20 21:12 alejandrogallo

papis papis copied to clipboard

sample workflow?

Smart mode

Explicit mode

TODO

Combined mode

Just adding a pdf

papis
papis copied to clipboard