pismo icon indicating copy to clipboard operation
pismo copied to clipboard

Extracts machine-readable metadata and content from Web pages

Results 13 pismo issues
Sort by recently updated
recently updated
newest added

"doc.images" call returns "nil" every time, even if there are valid images with absolute urls in the html page. The reader_doc.images array is empty every time.

coder.io not accessable

Year, hour and minutes was missing on datetime detection in this format: "Jul. 25, 2012 10:46 a.m". Because of this, Chronic was inferring year, hour and minutes wrongly.

Is it possible to add support for different languages? May be some kind of API / settings for it?

burl="http://www.momfluential.net" => "http://www.momfluential.net" ruby-1.9.2-p0 > pismo = Pismo[burl] ArgumentError: invalid byte sequence in UTF-8 from /Users/jtoy/.rvm/gems/ruby-1.9.2-p0/gems/pismo-0.7.2/lib/pismo/document.rb:48:in `gsub!' from /Users/jtoy/.rvm/gems/ruby-1.9.2-p0/gems/pismo-0.7.2/lib/pismo/document.rb:48:in`clean_html' from /Users/jtoy/.rvm/gems/ruby-1.9.2-p0/gems/pismo-0.7.2/lib/pismo/document.rb:36:in `load' from /Users/jtoy/.rvm/gems/ruby-1.9.2-p0/gems/pismo-0.7.2/lib/pismo/document.rb:16:in`initialize' from /Users/jtoy/.rvm/gems/ruby-1.9.2-p0/gems/pismo-0.7.2/lib/pismo.rb:29:in `new' from /Users/jtoy/.rvm/gems/ruby-1.9.2-p0/gems/pismo-0.7.2/lib/pismo.rb:29:in`[]' from...

Any fix planned for allowing redirects? thanks! "redirection forbidden: http://www.bettiepageclothing.com -> https://www.bettiepageclothing.com/"

When I tried to get images in this website I got this exception http://verkoren.wordpress.com/2013/04/12/you-cant-skate-you-old/ #

Noticed this when I was using the Pismo powered ‘entry text extraction’ on Feedbin. ``` irb >> Pismo['http://hsivonen.iki.fi/accept-charset/'].lede => "Accept-Charset Is No More. Now that Firefox 10 has been released,...

maybe you want to add a case-sensitive matchers for looking up the favicon: ``` ['link[@rel="Shortcut Icon"]', lambda { |el| el.attr('href') }], ``` https://github.com/fluxsaas/pismo/blob/master/lib/pismo/internal_attributes.rb#L36 also, it might be nice to add...

Ran into an issue with Pismo's default reader returning the wrong section of an HTML document for its `body`/`html_body` fields. It does work, however, with the cluster reader. This might...