scholar.py
scholar.py copied to clipboard
Added --citations-only option. It prints all the articles that cite the queried one
Added an handy option to automatically retrieve the list of articles that cites the first article returned by the query.
For instance, if you want a list of articles that cite the first article returned by this query:
$ ./scholar.py -c 1 --author "albert einstein" --phrase "quantum theory"
use the --citations-only option
$ ./scholar.py --citations-only -c 1 --author "albert einstein" --phrase "quantum theory"
and it will print this:
Title Modern Electrochemistry 2B: Electrodics in Chemistry, Engineering, Biology and Environmental Science
URL http://books.google.com/books?hl=en&lr=&id=V3tpJrG1H5wC&oi=fnd&pg=PA1539&ots=OUzlJ0YriM&sig=5gwx3WY-wSRLLMe3lRygYwxK1U8
Year 2000
Citations 7392
Versions 10
Cluster ID 13855735528547899559
Citations list http://scholar.google.com/scholar?cites=13855735528547899559&as_sdt=2005&sciodt=0,5&hl=en
Versions list http://scholar.google.com/scholar?cluster=13855735528547899559&hl=en&as_sdt=2005&sciodt=0,5
Excerpt This long awaited and thoroughly updated version of the classic text (Plenum Press, 1970) explains the subject of electrochemistry in clear, straightforward language for undergraduates and mature scientists who want to understand solutions. Like its
Title Spectral analysis and time series
URL http://www.citeulike.org/group/96/article/745677
Year 1981
Citations 6726
Versions 3
Cluster ID 16874516227592319711
Citations list http://scholar.google.com/scholar?cites=16874516227592319711&as_sdt=2005&sciodt=0,5&hl=en
Versions list http://scholar.google.com/scholar?cluster=16874516227592319711&hl=en&as_sdt=2005&sciodt=0,5
Excerpt Search all the public and authenticated articles in CiteULike. Include unauthenticated resultstoo (may include "spam") Enter a search phrase. You can also specify a CiteULike article id(123456),. a DOI (doi:10.1234/12345678). or a PubMed ID (pmid:12345678). Click Help for
Title Introduction
URL http://link.springer.com/chapter/10.1007/978-1-4614-0511-5_1
Year 2011
Citations 5380
Versions 59
Cluster ID 3815736992424174150
Citations list http://scholar.google.com/scholar?cites=3815736992424174150&as_sdt=2005&sciodt=0,5&hl=en
Versions list http://scholar.google.com/scholar?cluster=3815736992424174150&hl=en&as_sdt=2005&sciodt=0,5
Excerpt Abstract In recent years, the adopting of some supply chain practice such as outsourcing and lean production helps in smoothing the operations, but it also results in little buffer inventory in a supply chain which may lead to increased vulnerability of the chains. 1 At the
Title The random walk's guide to anomalous diffusion: a fractional dynamics approach
URL http://www.sciencedirect.com/science/article/pii/S0370157300000703
Year 2000
Citations 5144
Versions 18
Cluster ID 11032747530556470631
Citations list http://scholar.google.com/scholar?cites=11032747530556470631&as_sdt=2005&sciodt=0,5&hl=en
Versions list http://scholar.google.com/scholar?cluster=11032747530556470631&hl=en&as_sdt=2005&sciodt=0,5
Excerpt Fractional kinetic equations of the diffusion, diffusion–advection, and Fokker–Planck type are presented as a useful approach for the description of transport dynamics in complex systems which are governed by anomalous diffusion and non-exponential relaxation
Title Diffusion processes
URL http://onlinelibrary.wiley.com/doi/10.1002/0471667196.ess0495.pub2/full
Year 1974
Citations 3118
Versions 9
Cluster ID 13465318938558459827
Citations list http://scholar.google.com/scholar?cites=13465318938558459827&as_sdt=2005&sciodt=0,5&hl=en
Versions list http://scholar.google.com/scholar?cluster=13465318938558459827&hl=en&as_sdt=2005&sciodt=0,5
Excerpt Suppose that we are given a differential operator As of the form (6). We want to construct a diffusion process whose generator is As. In 1936, W. Feller proved that the backward equation (8) together with the terminal condition (9) has a unique solution under the
Title Metapopulation biology
URL http://agris.fao.org/agris-search/search.do?recordID=US201300021834
Year 1997
Citations 3025
Versions 2
Cluster ID 6335487017156325677
Citations list http://scholar.google.com/scholar?cites=6335487017156325677&as_sdt=2005&sciodt=0,5&hl=en
Versions list http://scholar.google.com/scholar?cluster=6335487017156325677&hl=en&as_sdt=2005&sciodt=0,5
Excerpt FAO_logo. home-icon. English; Español; Français; العربية; 中文; Русский.home-icon. Toggle navigation AGRIS. Register. Sign in. My Profile; Change Password;Searching History; Browsing History; Saved Publications; Logout. Search. Register;
Title Stochastic processes
URL http://epubs.siam.org/doi/pdf/10.1137/1.9781611971125.bm
Year 1999
Citations 2978
Versions 16
Cluster ID 9561949148186522176
Citations list http://scholar.google.com/scholar?cites=9561949148186522176&as_sdt=2005&sciodt=0,5&hl=en
Versions list http://scholar.google.com/scholar?cluster=9561949148186522176&hl=en&as_sdt=2005&sciodt=0,5
Excerpt When published in 1962 this book was described by some reviewers as a truly introductory textbook and comprehensive survey of stochastic processes, requiring only a minimal background in introductory probability theory and mathematical analysis. It continues to be
Title Cavitation and bubble dynamics
URL http://books.google.com/books?hl=en&lr=&id=yRhaAQAAQBAJ&oi=fnd&pg=PR11&ots=O6xRuHnbh2&sig=Mk4rT4w-xmW-mpbLGtM4cThmdJo
Year 2013
Citations 2868
Versions 26
Cluster ID 10903735145678015071
Citations list http://scholar.google.com/scholar?cites=10903735145678015071&as_sdt=2005&sciodt=0,5&hl=en
Versions list http://scholar.google.com/scholar?cluster=10903735145678015071&hl=en&as_sdt=2005&sciodt=0,5
Excerpt Cavitation and Bubble Dynamics deals with the fundamental physical processes of bubble dynamics and the phenomenon of cavitation. It is ideal for graduate students and research engineers and scientists, and a basic knowledge of fluid flow and heat transfer is assumed.
Title Introduction to colloid and surface chemistry: Butterworth-Heinemann, Oxford, 1991, ISBN 0 7506 1182 0, 306 pp,£ 14.95
Year 1993
Citations 2771
Versions 6
Cluster ID 10562832630033572094
Citations list http://scholar.google.com/scholar?cites=10562832630033572094&as_sdt=2005&sciodt=0,5&hl=en
Versions list http://scholar.google.com/scholar?cluster=10562832630033572094&hl=en&as_sdt=2005&sciodt=0,5
Title Irreversibility and generalized noise
URL http://scholar.google.com/https://journals.aps.org/pr/abstract/10.1103/PhysRev.83.34
Year 1951
Citations 2721
Versions 3
Cluster ID 13951920364609032371
Citations list http://scholar.google.com/scholar?cites=13951920364609032371&as_sdt=2005&sciodt=0,5&hl=en
Versions list http://scholar.google.com/scholar?cluster=13951920364609032371&hl=en&as_sdt=2005&sciodt=0,5
Excerpt Abstract A relation is obtained between the generalized resistance and the fluctuations of the generalized forces in linear dissipative systems. This relation forms the extension of the Nyquist relation for the voltage fluctuations in electrical impedances. The general formalism
Did you notice it bugs if you change the citation format? It only outputs the first result.
./scholar.py --phrase "Online Clustering of Bandits" --citations-only --citation bt
@inproceedings{kawale2015efficient,
title={Efficient Thompson Sampling for Online Matrix-Factorization Recommendation},
author={Kawale, Jaya and Bui, Hung H and Kveton, Branislav and Tran-Thanh, Long and Chawla, Sanjay},
booktitle={Advances in Neural Information Processing Systems},
pages={1297--1305},
year={2015}
}
}
@daniel-severo I just included this feature following the main behavior of the tool (with only a minimal change in the code).
Apparently in this case the main behavior, if you specify a citation format, is to return only the first article in that format.
For instance, this command
./scholar.py --phrase "deep learning"
returns the list of (some) papers that contains "deep learning":
Title Deep learning URL http://www.nature.com/nature/journal/v521/n7553/abs/nature14539.html Year 2015 Citations 1888 Versions 41 Cluster ID 5362332738201102290 Citations list http://scholar.google.com/scholar?cites=5362332738201102290&as_sdt=2005&sciodt=0,5&hl=en Versions list http://scholar.google.com/scholar?cluster=5362332738201102290&hl=en&as_sdt=0,5 Excerpt Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object Title Learning in science: A comparison of deep and surface approaches URL http://onlinelibrary.wiley.com/doi/10.1002/(SICI)1098-2736(200002)37:2%3C109::AID-TEA3%3E3.0.CO;2-7/full Year 2000 Citations 434 Versions 5 Cluster ID 8108748482885444188 Citations list http://scholar.google.com/scholar?cites=8108748482885444188&as_sdt=2005&sciodt=0,5&hl=en Versions list http://scholar.google.com/scholar?cluster=8108748482885444188&hl=en&as_sdt=0,5 Excerpt ... The findings also suggest that to encourage a deep learning approach, teachers couldprovide prompts and contextualized scaffolding and encourage students to ask questions,predict, and explain during activities. © 2000 John Wiley & Sons, Inc. ... Title Deep learning in neural networks: An overview URL http://www.sciencedirect.com/science/article/pii/S0893608014002135 Year 2015 Citations 1091 Versions 22 Cluster ID 15932869302045479284 Citations list http://scholar.google.com/scholar?cites=15932869302045479284&as_sdt=2005&sciodt=0,5&hl=en Versions list http://scholar.google.com/scholar?cluster=15932869302045479284&hl=en&as_sdt=0,5 Excerpt Abstract In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarizes relevant work, much of it from the previous millennium. Shallow and Title Multimodal deep learning URL http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2011Ngiam_399.pdf Year 2011 Citations 621 Versions 28 Cluster ID 4020282035517476898 PDF link http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2011Ngiam_399.pdf Citations list http://scholar.google.com/scholar?cites=4020282035517476898&as_sdt=2005&sciodt=0,5&hl=en Versions list http://scholar.google.com/scholar?cluster=4020282035517476898&hl=en&as_sdt=0,5 Excerpt Abstract Deep networks have been successfully applied to unsupervised feature learning for single modalities (eg, text, images or audio). In this work, we propose a novel application of deep networks to learn features over multiple modalities. We present a series of tasks for Title Why does unsupervised pre-training help deep learning? URL http://www.jmlr.org/papers/v11/erhan10a.html Year 2010 Citations 826 Versions 29 Cluster ID 13018263321881826087 Citations list http://scholar.google.com/scholar?cites=13018263321881826087&as_sdt=2005&sciodt=0,5&hl=en Versions list http://scholar.google.com/scholar?cluster=13018263321881826087&hl=en&as_sdt=0,5 Excerpt Abstract Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of auto-encoder variants, with impressive results obtained in several areas, mostly on vision and language data sets. The Title Unsupervised feature learning for audio classification using convolutional deep belief networks URL http://papers.nips.cc/paper/3674-unsupervised-feature-learning-for-audio-classification-using-convolutional-deep-belief-networks Year 2009 Citations 514 Versions 21 Cluster ID 2046036768079393393 Citations list http://scholar.google.com/scholar?cites=2046036768079393393&as_sdt=2005&sciodt=0,5&hl=en Versions list http://scholar.google.com/scholar?cluster=2046036768079393393&hl=en&as_sdt=0,5 Excerpt ... Abstract In recent years, deep learning approaches have gained significant interest as a wayof building hierarchical representations from unlabeled data. However, to our knowledge, thesedeep learning approaches have not been extensively stud- ied for auditory data. ... Title Domain adaptation for large-scale sentiment classification: A deep learning approach URL http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2011Glorot_342.pdf Year 2011 Citations 497 Versions 20 Cluster ID 18093548304865208974 PDF link http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2011Glorot_342.pdf Citations list http://scholar.google.com/scholar?cites=18093548304865208974&as_sdt=2005&sciodt=0,5&hl=en Versions list http://scholar.google.com/scholar?cluster=18093548304865208974&hl=en&as_sdt=0,5 Excerpt Abstract The exponential increase in the availability of online reviews and recommendations makes sentiment classification an interesting topic in academic and industrial research. Reviews can span so many different domains that it is difficult to gather annotated training Title Deep Learning for a Digital Age: Technology's Untapped Potential To Enrich Higher Education. URL http://eric.ed.gov/?id=ED457787 Year 2002 Citations 355 Versions 2 Cluster ID 11010152299026972441 Citations list http://scholar.google.com/scholar?cites=11010152299026972441&as_sdt=2005&sciodt=0,5&hl=en Versions list http://scholar.google.com/scholar?cluster=11010152299026972441&hl=en&as_sdt=0,5 Excerpt This book shows how faculty can help students develop skills in research, problem solving, critical thinking, and knowledge management by using Web-based collaboration tools. This innovative approach to teaching and learning emphasizes the use of virtual spaces," Title On the importance of initialization and momentum in deep learning. URL http://www.jmlr.org/proceedings/papers/v28/sutskever13.pdf Year 2013 Citations 499 Versions 17 Cluster ID 7449004388220998591 PDF link http://www.jmlr.org/proceedings/papers/v28/sutskever13.pdf Citations list http://scholar.google.com/scholar?cites=7449004388220998591&as_sdt=2005&sciodt=0,5&hl=en Versions list http://scholar.google.com/scholar?cluster=7449004388220998591&hl=en&as_sdt=0,5 Excerpt Abstract Deep and recurrent neural networks (DNNs and RNNs respectively) are powerful models that were considered to be almost impossible to train using stochastic gradient descent with momentum. In this paper, we show that when stochastic gradient descent with Title Playing atari with deep reinforcement learning URL http://scholar.google.com/https://arxiv.org/abs/1312.5602 Year 2013 Citations 436 Versions 26 Cluster ID 10603651548644623407 Citations list http://scholar.google.com/scholar?cites=10603651548644623407&as_sdt=2005&sciodt=0,5&hl=en Versions list http://scholar.google.com/scholar?cluster=10603651548644623407&hl=en&as_sdt=0,5 Excerpt ... DeepMind Technologies {vlad,koray,david,alex.graves,ioannis,daan,martin.riedmiller} @deepmind.com Abstract We present the first deep learning model to successfully learn controlpolicies di- rectly from high-dimensional sensory input using reinforcement learning. ...
whilst if you specify the citation format
./scholar.py --phrase "deep learning" --citation bt
you get only the bibtex format of the first paper in the previous list
@article{lecun2015deep, title={Deep learning}, author={LeCun, Yann and Bengio, Yoshua and Hinton, Geoffrey}, journal={Nature}, volume={521}, number={7553}, pages={436--444}, year={2015}, publisher={Nature Research} }
So unless i missed something, no, my feature didn't added a bug, it's just a preexisting behavior.
Checking the code, I've found that the problem is present only when the settings are specified (like in the case of the citation format). In the settings data structure there's a field named per_page_results initially set at None. Changing it to a proper value (like 10) will solve the problem.
Since it's a very easy fix, this may or may not be a bug in the main tool, it could be an intended behavior of the author. Anyway I, as you, think that it's more useful to return the exact same list converted in the specified format, so I've pushed a fix for that on my branch. Hope that my push there will reflect in the code present in this pull request.
I've only tested it for biblatex format before incurring in the captcha limit. It should works for other formats too, but please double check that for me.
tl:dr it wasn't my fault. It might not be a bug. Fixed anyway.
Thank you for this great modification, it sounds to do exactly what I am looking for. But unfortunately when I run the code after modifications to the parts you added and deleted I get the following error:
self.per_page_results = 10 ^ IndentationError: unexpected indent
So I have not get the output yet and I would like to have a list of papers cited by an original paper in CSV format. I would be glad if you could help me. I am using Spyder(Python 3.6) in case it is related to solve the problem.
Thanks in advanced
It should've been a typo, try it now and let me know.
I am sorry it is not fixed yet, the error line is 1035 as shown below:
runfile('C:/Users/NOVEMBER/Documents/src/PaperCrawler/.git/scholar.py', args='--citations-only -c 1 --author "albert einstein" --phrase "quantum theory"', wdir='C:/Users/NOVEMBER/Documents/src/PaperCrawler/.git') File "C:/Users/NOVEMBER/Documents/src/PaperCrawler/.git/scholar.py", line 1035 self.send_query(query) ^ IndentationError: unindent does not match any outer indentation level
I added exactly the lines you added and deleted what you have deleted, and used the command line: --citations-only -c 1 --author "albert einstein" --phrase "quantum theory"
Did I do something wrong?
I have read a little bit about changing taps into 4 spaces in order to fix the previous error, and I changed the typo you told me about, but then I got another error in line 1035 as previously stated.
Thank you for replying so fast, I appreciate it.
What I want to do exactly is to get all the papers that cited the paper "Novel properties of the Fourier decomposition of the sinogram" and put the titles of those papers in CSV format. The command line you worked on should do it, right? Or what should I do exactly in your opinion? I apologize I am still new to these things and I would love to learn.
Thank you for time and consideration
On Mon, Apr 17, 2017 at 3:25 PM, Luca Baronti [email protected] wrote:
It should've been a typo, try it now and let me know.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ckreibich/scholar.py/pull/83#issuecomment-294485042, or mute the thread https://github.com/notifications/unsubscribe-auth/AXu1JJAG_FiPa5GWzVt9kRtmLl9gTi7Kks5rw2hLgaJpZM4MMDJg .
Your problem seems related to the different indentation styles used in different systems (mine is Unix, I assume you are using windows). It's hard to fix that problem for me since I'm not able to reproduce it, however you should be able to replace the spaces with tabs (or vice-versa) where needed.
Everything else should work as intended, let me know otherwise.
On another note, I've just noticed that the current version is unable to download more than the first 10 citations. The right solution might require to perform more modifications to the code that I'm intended to do. Truth be told I'm not the greatest fan of how the code is structured in this project, so I prefer to keep the modifications at minimum, relying on the author for proper integrate my parts (should the need arise).
Now I've pushed a workaround that is able to fetch all the citations for a given paper.
Since I'm doing a scholar search every 10 citations, I've put a sleep between them.
That means that the command requires 1 second every 10 citations the paper has.
If you need a faster solution just change the number of the seconds in the sleep at line 1052 to lower values
time.sleep(1)
just be sure to not flood the server with requests or you might be softbanned.
A better solution may exists, I'll double check that later.
Also, it's been quite some time since last time I've touched that code, and I hadn't time to check every possible interactions, so let me know if you find some new issues.
About your specific query, I've checked this command
$ ./scholar.py --phrase "Novel properties of the Fourier decomposition of the sinogram" --citations-only --csv
and it acctually prints the csv of all the 151 papers that cite it (too long to paste here)
I fixed what you changed and fixed the taps and spaces problem now I am getting onlz this output:
UserWarning: To exit: use 'exit', 'quit', or Ctrl-D. warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1) An exception has occurred, use %tb to see the full traceback.
SystemExit: 0
I am using what is called Python interpreter, I only downloaded Winpython version 3.6 for Windows and opened the scholar.py from the shortcut Spyder that the Winpython provides, could you please tell me how do you usually run the code on your device, and what about the things people mention in other questions and comments about Beautifulsoup4 and Pip, I have no idea on how to run the code other than Spyder, please if you have some time tell me how you run it.
Thanks in advanced.
Hello again, just wanted to say that in order to get rid of the error message that I have posted earlier, I had to remove System pause from the code. That is what was stopping the code from running smoothly,so to be more clear, instead of line 1347 sys.exit(main()), write main() and there will be no more errors. I have the results that I want now. Thank you so much for everything.
The sys.exit(main()) was put there by the original author.
It's not a pause, it's used to provide an exit code (which is the basic way to determine if the program terminated correctly).
It works well on my machine, however if that causes you troubles I think that you can safely replace it with main() as you did.
Hello Mr. Baronti, thank you for your reply, you are right it is not about the system.exit because it is happening again. The thing is that I sometimes get an output of the required data and other times for some reason I get no output at all. Knowing that I have not changed anything in the code, it is still the same one I used before.
Please tell me if you have any idea why this is happening, I am sure it is not because of the code, you did a great job, my question is like is it related to the google scholar itself or is there anything that I am not taking into consideration? It used to happen then it worked and now it is not working again, and I am still using the same code!!!
Thank you for your time and consideration.
It's possible that you made too many requests in a day and the server blocked them as result. When the server detect too many requests from the same user it may softban him or perform some checks (usually in the form of a captcha). When this event occurs, the program stop working as it can't go any further and this may explain your problem.
Has been a while since last time I checked this project code, but I remember that I couldn't find a way to request all the citations at once. In order to fetch every citation, I had to ask them page-by-page (e.g. if a paper has 300 citations I had to make 30 separate requests). In order to prevent the flooding of the server I put a sleep of 1 second in between them, however the server checks may consider the request numbers as well as their frequency.
For this purpose, an user may be blocked server-side by its ip, their cookies, or both. If I remember well the program reset the cookies at every run, however the ip will stay the same, so further requests once the program fails the first time are unlikely to succeed.
You can try to mitigate the problem increasing the sleep time (search for sleep in the code) but keep in mind that since this is a server issue there are very few things we can do client-side to address it.
It is mostly the reasons that I am thinking of, but since we can not fix this, how can I be sure that the server is blocking me, and that it is not another issue? Because I do not receive any warnings, it is just not giving any output as for the very first few times I tried the code. Is it possible to make a warning message when the server is drawn in request, or if the user is blocked?
I'm currently using the original author's functions to query google scholar.
I agree that a more informative error message might be helpful, unfortunately I don't have much time to check this and, in fact, a change of that part is way beyond the scope of this pull request.
If I manage to find some extra time for this project I might work on a more informative error message and submit it on a separate pull request.
However I can't promise nothing now on that regard, I'm sorry.
@lucabaronti It would be fantastic if this could be merged. It's a very useful feature.
@ivanperez-keera I'm glad you like it. You should ask the original author since he's the sole who can merge this pull request.
I like the idea, but I have not been able to try it yet. Does it work with the latest version of scholar.py?
As you can see from the date of my last comment, it has been a while since last time I tried it. However I think it should work with the current version, so you should give it a try.