Support for old arXiv identifier format and category of paper
Hi, thank you for developing this project. I have a couple of questions/suggestions.
-
It seems that this project only supports the new version of the arxiv identifier, which is of the form
arXiv:YYMM.number. (cf. https://info.arxiv.org/help/arxiv_identifier.html) Are there any plans to support the old version identifier (e.g., arXiv:math/9801119)? It would be helpful when you are working on areas that often refer to old papers, such as math.AG. -
It would be useful if it could extract the (possibly multiple) categories of a paper (e.g.,
cs.AI). For example, I am working on a subject involving math.AG, math.RT, math.SG, and math.CT, so it would be quite convenient if I can collect all related papers into a single database and can filter or classify them based on their categories.
Thanks!
I fixed the code to add support for the old ID format for my own use. I also created a PR for this change.
Dear @aralsea ,
Thank you so much for contributing this project ! I sincerely apologize for this late reply. I have just been discharged from the maternity hospital and have been caring for newborn twins.
I'm glad to be able to understand the needs of users from various fields through this project. I reviewed your PR #15 and added some suggestions, could you check it?
- It would be useful if it could extract the (possibly multiple) categories of a paper (e.g., cs.AI). For example, I am working on a subject involving math.AG, math.RT, math.SG, and math.CT, so it would be quite convenient if I can collect all related papers into a single database and can filter or classify them based on their categories.
Thank you for letting me know about your additional request,
I designed this extension with flexibility in mind, so as not to limit it to a specific use case. Given the diverse range of paper categories within cs.AI, users might create multiple databases corresponding to their interests (e.g., LLM) and classify a large amount of relevant papers spanning multiple categories (stat.ML, cs.LG, cs.CV, cs.AI, cs.RO, etc.) on a daily basis.
Currently, I am not planning to add support for extracting these categories. However, as you may have already tried, you can customize this extension in your fork!
included in the v1.3.0 release. thank you for contributing! @aralsea