PHPScraper icon indicating copy to clipboard operation
PHPScraper copied to clipboard

A universal web-util for PHP.

PHP Scraper

PHP Scraper is a scraper library for PHP, built with simplicity in mind. The main goal is to get stuff done instead of getting distracted with xPath selectors, preparing data structures, etc. Instead, you can just "go to a website" and get an array with all details relevant to your scraping project.

Under the hood, it uses Goutte and a few other packages. See composer.json.

Sponsors

This project is sponsored by:

Want to sponsor this project? Contact me.

Examples

Here are a few impressions of the way the library works. More examples are on the project website.

Basics: Get the Title of a Website

All scraping functionality can be accessed either as a function call or a property call. For example, the title can be accessed in two ways:

// Prep
$web = new \spekulatius\phpscraper;

$web->go('https://google.com');

// Returns "Google"
echo $web->title;

// Also returns "Google"
echo $web->title();

Links

The following example shows how to collect links along with meta information:

$web = new \spekulatius\phpscraper;

/**
 * Navigate to the test page. This page contains several links with different rel attributes. To save space only the first one:
 *
 * <a href="https://placekitten.com/432/287" rel="nofollow">external kitten</a>
 */
$web->go('https://test-pages.phpscraper.de/links/rel.html');

// Get the first link on the page.
$firstLink = $web->linksWithDetails[0];

/**
 * $firstLink contains now:
 *
 * [
 *     'url' => 'https://placekitten.com/432/287',
 *     'text' => 'external kitten',
 *     'title' => null,
 *     'target' => null,
 *     'rel' => 'nofollow',
 *     'isNofollow' => true,
 *     'isUGC' => false,
 *     'isNoopener' => false,
 *     'isNoreferrer' => false,
 * ]
 */

If there aren't any matching elements (here links) on the page, an empty array will be returned.

Scrape the Images from a Website

Scraping the images including the attributes of the img-tags:

// Prep
$web = new \spekulatius\phpscraper;

/**
 * Navigate to the test page.
 *
 * This page contains twice the image "cat.jpg".
 * Once with a relative path and once with an absolute path.
 */
$web->go('https://test-pages.phpscraper.de/meta/lorem-ipsum.html');

var_dump($web->imagesWithDetails);
/**
 * Contains:
 *
 * [
 *     'url' => 'https://test-pages.phpscraper.de/assets/cat.jpg',
 *     'alt' => 'absolute path',
 *     'width' => null,
 *     'height' => null,
 * ],
 * [
 *     'url' => 'https://test-pages.phpscraper.de/assets/cat.jpg',
 *     'alt' => 'relative path',
 *     'width' => null,
 *     'height' => null,
 * ]
 */

See the full documentation on the website for more information and many more examples.

Installation

Composer is used to install PHPScraper:

composer require spekulatius/phpscraper

After the installation, the package will be picked up by the Composer autoloader. You can start scraping now if you are using typical PHP applications or frameworks such as Laravel or Symfony. You can now use any of the examples on the website or examples in the tests/-folder.

Please consider supporting PHPScraper with a star or sponsorship:

composer thanks

Thank you :muscle: