goreadability icon indicating copy to clipboard operation
goreadability copied to clipboard

Webpage summary extractor using Facebook Open Graph and arc90's readability

goreadability

GoDoc Go Report Card Code Coverage Build Status

goreadability is a tool for extracting the primary readable content of a webpage. It is a Go port of arc90's readability project, based on ruby-readability.

From v2.0 goreadability uses opengraph tag values if exists. You can disable opengraph lookup and follow the traditional readability rules by setting Option.LookupOpenGraphTags to false.

Install

go get github.com/philipjkim/goreadability

Example

// URL to extract contents (title, description, images, ...)
url := "https://en.wikipedia.org/wiki/Lego"

// Default option
opt := readability.NewOption()

// You can modify some option values if needed.
opt.ImageRequestTimeout = 3000 // ms

content, err := readability.Extract(url, opt)
if err != nil {
    log.Fatal(err)
}

log.Println(content.Title)
log.Println(content.Description)
log.Println(content.Images)

Testing

go test

# or if you want to see verbose logs:
DEBUG=true go test -v

Command Line Tool

TODO

Related Projects

  • ruby-readability is the base of this project.
  • fastimage finds the type and/or size of a remote image given its uri, by fetching as little as needed.

Potential Issues

TODO

License

MIT