wikipedia-client
wikipedia-client copied to clipboard
How to pull logo/profile image from Infobox table?
It seems the entire side Wikipedia Infobox table is ignored when accessing the page.sanitized_content.
How can I pull only a company logo or a person's profile photo from the Infobox?
The rest of the Infobox content would be nice to have as well.
@tpitt
Did you find a way to obtain the Infobox table data? I just started using this nice gem but can't seem to find a way to retrieve this data.
Hi,
It is possible to extract this information by using page.raw_data
method. It is a nice contribution if you would like to add the functionality to retrieve only the infobox.
@pietromenna Thanks for your reply.
I would love to submit a pull request, but while inspecting the raw_data
I couldn't seem to find the infobox's information.
For example, a search on the raw_data
of the Great white shark doesn't seem to have Phylum, Kingdom, etc...
I think the request_page
method of the client might be incomplete. Probably some options or parameters missing, but I find the Wikipedia API documentation kind of confusing, and a pain to understand...
Would appreciate some insight, I would love to submit the patch then.
Thank you @BigChief45 for your example. In the example raw_data
does not contain that section when queried without additional parameters. This is because not all templates get pulled by the API call to wikipedia with the default parameters.
In order to make this query to work, you will have toinclude tllimit => 500
. For more information about tllimit here.
Check out if the code below works:
page = Wikipedia.find( 'Great white shark', :tllimit => 500)
Usually when raw_data
has something missing, it is related to default parameters on the API of wikipedia. More details here.
I hope this helped.
@pietromenna I tried using that option with 500
as value. But I am still not getting that data... am I missing something else?
I apologize if I understood incorrectly. :-( I though you were looking for the list of taxonomies:
I ran this example:
require 'wikipedia'
page = Wikipedia.find( 'Great white shark', :tllimit => 500 )
puts page.templates
And it came this list: ... Template:Taxonomy Template:Taxonomy/Animalia Template:Taxonomy/Bilateria Template:Taxonomy/Carcharodon Template:Taxonomy/Chondrichthyes Template:Taxonomy/Chordata Template:Taxonomy/Craniata Template:Taxonomy/Deuterostomia Template:Taxonomy/Elasmobranchii Template:Taxonomy/Eugnathostomata Template:Taxonomy/Eukaryota Template:Taxonomy/Eumetazoa Template:Taxonomy/Euselachii Template:Taxonomy/Filozoa Template:Taxonomy/Gnathostomata Template:Taxonomy/Holozoa Template:Taxonomy/Lamnidae Template:Taxonomy/Lamniformes Template:Taxonomy/Life Template:Taxonomy/Nephrozoa Template:Taxonomy/Opisthokonta Template:Taxonomy/Selachimorpha Template:Taxonomy/Unikonta Template:Taxonomy/Vertebrata ...
Without the parameter the API simply does not return all the templates.
This gem retrieves information by calling wikipedia API, mostly by using the query action. If you want to see what gets returned by the API, you can check raw_data
method. It should have the same content as using directly the API by using the wikipedia sandbox. If you don't pass any parameters, the API returns only part of the the contents (probably for security reasons).
One suggestion is try to get the query by using the sandbox to get the results you want. Then you are able to set the parameters to the gem so it retrieves for you the information you need programatically.
I hope this helps.