wikipedia-client icon indicating copy to clipboard operation
wikipedia-client copied to clipboard

How to pull logo/profile image from Infobox table?

Open tpitt opened this issue 10 years ago • 6 comments

It seems the entire side Wikipedia Infobox table is ignored when accessing the page.sanitized_content.

How can I pull only a company logo or a person's profile photo from the Infobox?

The rest of the Infobox content would be nice to have as well.

tpitt avatar Apr 13 '14 21:04 tpitt

@tpitt

Did you find a way to obtain the Infobox table data? I just started using this nice gem but can't seem to find a way to retrieve this data.

aalvrz avatar Jul 16 '17 09:07 aalvrz

Hi,

It is possible to extract this information by using page.raw_data method. It is a nice contribution if you would like to add the functionality to retrieve only the infobox.

pietromenna avatar Jul 17 '17 00:07 pietromenna

@pietromenna Thanks for your reply.

I would love to submit a pull request, but while inspecting the raw_data I couldn't seem to find the infobox's information.

For example, a search on the raw_data of the Great white shark doesn't seem to have Phylum, Kingdom, etc...

I think the request_page method of the client might be incomplete. Probably some options or parameters missing, but I find the Wikipedia API documentation kind of confusing, and a pain to understand...

Would appreciate some insight, I would love to submit the patch then.

aalvrz avatar Jul 17 '17 01:07 aalvrz

Thank you @BigChief45 for your example. In the example raw_data does not contain that section when queried without additional parameters. This is because not all templates get pulled by the API call to wikipedia with the default parameters.

In order to make this query to work, you will have toinclude tllimit => 500. For more information about tllimit here.

Check out if the code below works: page = Wikipedia.find( 'Great white shark', :tllimit => 500)

Usually when raw_data has something missing, it is related to default parameters on the API of wikipedia. More details here.

I hope this helped.

pietromenna avatar Jul 17 '17 02:07 pietromenna

@pietromenna I tried using that option with 500 as value. But I am still not getting that data... am I missing something else?

aalvrz avatar Jul 17 '17 03:07 aalvrz

I apologize if I understood incorrectly. :-( I though you were looking for the list of taxonomies:

I ran this example:

require 'wikipedia'
page = Wikipedia.find( 'Great white shark', :tllimit => 500 )
puts page.templates

And it came this list: ... Template:Taxonomy Template:Taxonomy/Animalia Template:Taxonomy/Bilateria Template:Taxonomy/Carcharodon Template:Taxonomy/Chondrichthyes Template:Taxonomy/Chordata Template:Taxonomy/Craniata Template:Taxonomy/Deuterostomia Template:Taxonomy/Elasmobranchii Template:Taxonomy/Eugnathostomata Template:Taxonomy/Eukaryota Template:Taxonomy/Eumetazoa Template:Taxonomy/Euselachii Template:Taxonomy/Filozoa Template:Taxonomy/Gnathostomata Template:Taxonomy/Holozoa Template:Taxonomy/Lamnidae Template:Taxonomy/Lamniformes Template:Taxonomy/Life Template:Taxonomy/Nephrozoa Template:Taxonomy/Opisthokonta Template:Taxonomy/Selachimorpha Template:Taxonomy/Unikonta Template:Taxonomy/Vertebrata ...

Without the parameter the API simply does not return all the templates.

This gem retrieves information by calling wikipedia API, mostly by using the query action. If you want to see what gets returned by the API, you can check raw_data method. It should have the same content as using directly the API by using the wikipedia sandbox. If you don't pass any parameters, the API returns only part of the the contents (probably for security reasons).

One suggestion is try to get the query by using the sandbox to get the results you want. Then you are able to set the parameters to the gem so it retrieves for you the information you need programatically.

I hope this helps.

pietromenna avatar Jul 18 '17 01:07 pietromenna