arxiv.py
arxiv.py copied to clipboard
Query string helpers
Atomic conditions
condition(field: "all"|"au"|..., value: string)
:
-
condition("au", "Balents Leon")
→"au:\"Balents Leon\""
-
condition("au", "balents_leon")
→"au:balents_leon"
-
condition("cat", "cond-mat.str-el")
→"au:cond-mat.str-el"
- Open question: how to enumerate the available fields, values when they're enumerable.
prefix | explanation |
---|---|
ti | Title |
au | Author |
abs | Abstract |
co | Comment |
jr | Journal Reference |
cat | Subject Category |
rn | Report Number |
id | Id (use id_list instead) |
all | All of the above |
Boolean assembly
These correspond to the three Boolean operators supported by the arXiv API.
and(cond1, cond2)
→ "$(cond1) AND $(cond2)"
or(cond1, cond2)
→ "$(cond1) OR $(cond2)"
andnot(cond1, cond2)
→ "$(cond1) ANDNOT $(cond2)"
Grouping
group(cond)
→ "($(cond))"
An example of some more advanced query construction: https://github.com/lukasschwab/arxiv.py/issues/83#issuecomment-907967099
Open question: how to enumerate the available fields, values when they're enumerable.
Enums should be useful here, e.g. a (Query).Attribute
enum: Attribute.Title
, Attribute.Author
, and so on.
The atomic conditions––arguments to and
, or
, andnot
––are actually valid queries themselves. Which means they could be instead exposed as methods on the Query
class:
-
cond1.and(cond2)
-
cond1.or(cond2)
-
cond1.andnot(cond2).andnot(cond3)
: I think this chaining is more literate thanandnot(andnot(cond1, cond2), cond3)
.
It'd be nice to have arXiv as a source of truth for an enum of categories... but I might just have to accept the risk that the categories will change, and that new categories will have to be integrated here as patch releases. This is a good reason not to transform categories on Result
s into the enum type: new categories may not be explicitly queryable, but they should not break processing results.
Implementation detail: can build the string as we go along, or can assemble a tree which gets traversed to build the string.
We can use excessive grouping to convert queries to strings (so a.or(b)
can yield (a) OR (b)
). The group
function expressed above may be unnecessary––group(a, b)
may just be equivalent to a.or(b)
.
Good opportunity to define an interface and then write tests.