php-phantomjs icon indicating copy to clipboard operation
php-phantomjs copied to clipboard

page automation

Open Sarfroz opened this issue 7 years ago • 6 comments

hi, can you guide how exactly i can do this using php and phantom js.

http://phantomjs.org/page-automation.html

Sarfroz avatar Apr 14 '17 17:04 Sarfroz

hi Sarfroz,

You can do this via custom scripts. I managed to pulled it off, but ensure you have the [% autoescape false %] [% endautoescape %]

so you can get the URL passed from the php script.

The documentation is here: http://jonnnnyw.github.io/php-phantomjs/4.0/4-custom-scripts/

Example code below:

[% autoescape false %]

var page = require('webpage').create(); var fs = require('fs'); var url = '{{ input.getUrl() }}';

page.open(url, 'GET', '', function (status){

var content = page.content;

var path = '/home/steven/Code/phantomjs/logs/log_script11.txt';
fs.write(path, url, 'w');
fs.write(path, content, 'w+');
phantom.exit(1);

});

phantom.onError = function(msg, trace) { phantom.exit(1); };

[% endautoescape %]

yipwt79 avatar May 04 '17 10:05 yipwt79

I tried sir but not working. I am using Partial script injection but no luck. this is my working phantom js code

var page = require('webpage').create();

page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36'; page.onInitialized = function() { page.evaluate(function() { delete window._phantom; delete window.callPhantom; }); }; page.open('https://xxxxxxx', function(status) { if (status !== 'success') { console.log('Unable to access network'); } else { var ua = page.evaluate(function() { return document.getElementById('iddoc').textContent; }); console.log(ua); } phantom.exit(); });

if I run it via phantomjs command directly it works ok, but the problem is that I have to write everytime js code to change the url value. I hope you can give some example of this method.

On Thu, May 4, 2017 at 4:02 PM, yipwt79 [email protected] wrote:

hi Sarfroz,

You can do this via custom scripts. I managed to pulled it off, but ensure you have the [% autoescape false %] [% endautoescape %]

so you can get the URL passed from the php script.

The documentation is here: http://jonnnnyw.github.io/php-phantomjs/4.0/4-custom-scripts/

Example code below:

[% autoescape false %]

var page = require('webpage').create(); var fs = require('fs'); var url = '{{ input.getUrl() }}';

page.open(url, 'GET', '', function (status){

var content = page.content;

var path = '/home/steven/Code/phantomjs/logs/log_script11.txt'; fs.write(path, url, 'w'); fs.write(path, content, 'w+'); phantom.exit(1);

});

phantom.onError = function(msg, trace) { phantom.exit(1); };

[% endautoescape %]

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jonnnnyw/php-phantomjs/issues/174#issuecomment-299149500, or mute the thread https://github.com/notifications/unsubscribe-auth/AL3kJqIs3XBim8e6Py0I5vqk6eQXNcykks5r2alNgaJpZM4M980Z .

Sarfroz avatar May 06 '17 18:05 Sarfroz

hi Sarfroz,

Ok, I've tried out the script and it works, but I'll let you know what needs to be done:

  1. Do NOT enable debugging, because there's known bugs when this is enabled, the script will take a long time, with no response. You can refer to this issue here: https://github.com/jonnnnyw/php-phantomjs/issues/74

I think debugging is better done via the terminal, eg

phantomjs --debug=true myscript.proc

Therefore you can catch any problems here first.

  1. I haven't tried partial scripts, only CUSTOM scripts, and I believe this is what you plan to do. Partial scripts is sort of over riding the partial scripts in the codes, so I think you really need to understand what JonnyW did. I didn't spend much time on this.

  2. Make sure that your scripts have the right permission:

chmod 755 testing1.proc I am running Apache2 on Linux Ubuntu, so I also set: chown :www-data testing1.proc

  1. You'll need to be creative when returning data back to the caller PHP script. Define, and use a response.content object in the testing1.proc

var response = {content:null}; //declaring an object response response.content = 'my content here'; //assign the results you want to pass back console.log(JSON.stringify(response)); //output it in JSON format.

You will be able to get the results in PHP script via: $response->getContent();

Note that if you don't pass a valid JSON string, the app doesn't give you the content that you want.

  1. You can create a centralize phantomjs config file: === `{ /* Same as: --ignore-ssl-errors=true */ "ignoreSslErrors": true,

/* Same as: --max-disk-cache-size=1000 */ "maxDiskCacheSize": 1000,

/* Same as: --output-encoding=utf8 */ "outputEncoding": "utf8",

"cookiesFile" : "/home/steven/Code/phantomjs/cookies/cookies.txt" }`

ok said that, here's my PHP caller full script:

`<?php

//timer $start = microtime(true);

use JonnyW\PhantomJs\Client; use JonnyW\PhantomJs\DependencyInjection\ServiceContainer; use JonnyW\PhantomJs\Message\Request;

require_once 'vendor/autoload.php'; require_once 'config.php';

error_reporting(E_ALL);

$client = Client::getInstance(); //var_dump($client->getCommand());

$location = '/home/steven/Code/phantomjs/procedures/';

$serviceContainer = ServiceContainer::getInstance(); $procedureLoader = $serviceContainer->get('procedure_loader_factory')->createProcedureLoader($location);

$url = 'https://www.reddit.com/'; /*** the script testing1.proc is located under $location ***/ $fileName = 'testing1';

$client = Client::getInstance(); //$client->getEngine()->debug(true); //Hangs when enabled!!! $client->getEngine()->addOption('--config=/home/steven/Code/phantomjs/phantomjs-config.json'); $client->getEngine()->addOption("--web-security=no"); $client->getEngine()->addOption('--ssl-protocol=tlsv1');

//$client->getProcedureCompiler()->clearCache(); //$client->getProcedureCompiler()->disableCache(); //enableCache(), clearCache();

$client->setProcedure($fileName); $client->getProcedureLoader()->addLoader($procedureLoader); $request = $client->getMessageFactory()->createRequest(); //for custom scripts. $response = $client->getMessageFactory()->createResponse();

$request->setMethod('GET'); $request->setUrl($url);

try{

$client->send($request, $response);

//echo "\n==== log ==== \n" .$client->getLog() . "\n";

//print_r($response->getConsole()); // Array

echo print_R($response->getHeaders()) ;

echo "status = " . $response->getStatus() . "\n";

echo "content = " . $response->getContent() . "\n" ;

} catch(Exception $e){

echo "Error catch\n";

echo $e->getMessage();

var_dump($client->getLog());
//print_r($e->getErrors());

}

/*** timer end ***/ $stop = round(microtime(true) - $start, 5);

echo "time: {$stop}\n";

?>`

Here is the testing1.proc

`[% autoescape false %]

var page = require('webpage').create(); var url = '{{ input.getUrl() }}';

page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36';

page.onInitialized = function() { page.evaluate(function() { delete window._phantom; delete window.callPhantom; }); };

page.open(url, function(status) {

if (status !== 'success') {

	console.log('Unable to access network');

} else {

	var ua = page.evaluate(function() {
   		
   		return document.getElementById('siteTable').innerHTML;
 	});

 	//console.log(ua);


 	var response = {content:null};
	response.content = ua
	console.log(JSON.stringify(response));

}

phantom.exit();

});

[% endautoescape %]`

Ok, hope this helps.

Cheers

yipwt79 avatar May 07 '17 06:05 yipwt79

Works like as a charm. Thanks a lot for this kind of support :) Only I disabled these lines and still, it was working good: $client->getEngine()->addOption('--config=/home/steven/Code/phantomjs/ phantomjs-config.json'); $client->getEngine()->addOption("--web-security=no"); $client->getEngine()->addOption('--ssl-protocol=tlsv1');

On Sun, May 7, 2017 at 11:54 AM, yipwt79 [email protected] wrote:

hi Sarfroz,

Ok, I've tried out the script and it works, but I'll let you know what needs to be done:

  1. Do NOT enable debugging, because there's known bugs when this is enabled, the script will take a long time, with no response. You can refer to this issue here: #74 https://github.com/jonnnnyw/php-phantomjs/issues/74

I think debugging is better done via the terminal, eg

phantomjs --debug=true myscript.proc

Therefore you can catch any problems here first.

I haven't tried partial scripts, only CUSTOM scripts, and I believe this is what you plan to do. Partial scripts is sort of over riding the partial scripts in the codes, so I think you really need to understand what JonnyW did. I didn't spend much time on this. 2.

Make sure that your scripts have the right permission:

chmod 755 testing1.proc I am running Apache2 on Linux Ubuntu, so I also set: chown :www-data testing1.proc

  1. You'll need to be creative when returning data back to the caller PHP script. Define, and use a response.content object in the testing1.proc

var response = {content:null}; //declaring an object response response.content = 'my content here'; //assign the results you want to pass back console.log(JSON.stringify(response)); //output it in JSON format.

You will be able to get the results in PHP script via: $response->getContent();

Note that if you don't pass a valid JSON string, the app doesn't give you the content that you want.

  1. You can create a centralize phantomjs config file: === `{ /* Same as: --ignore-ssl-errors=true */ "ignoreSslErrors": true,

/* Same as: --max-disk-cache-size=1000 */ "maxDiskCacheSize": 1000,

/* Same as: --output-encoding=utf8 */ "outputEncoding": "utf8",

"cookiesFile" : "/home/steven/Code/phantomjs/cookies/cookies.txt" }` ok said that, here's my PHP caller full script:

`<?php

//timer $start = microtime(true);

use JonnyW\PhantomJs\Client; use JonnyW\PhantomJs\DependencyInjection\ServiceContainer; use JonnyW\PhantomJs\Message\Request;

require_once 'vendor/autoload.php'; require_once 'config.php';

error_reporting(E_ALL);

$client = Client::getInstance(); //var_dump($client->getCommand());

$location = '/home/steven/Code/phantomjs/procedures/';

$serviceContainer = ServiceContainer::getInstance(); $procedureLoader = $serviceContainer->get('procedure_loader_factory')-> createProcedureLoader($location);

$url = 'https://www.reddit.com/'; /*** the script testing1.proc is located under $location ***/ $fileName = 'testing1';

$client = Client::getInstance(); //$client->getEngine()->debug(true); //Hangs when enabled!!! $client->getEngine()->addOption('--config=/home/steven/Code/phantomjs/ phantomjs-config.json'); $client->getEngine()->addOption("--web-security=no"); $client->getEngine()->addOption('--ssl-protocol=tlsv1');

//$client->getProcedureCompiler()->clearCache(); //$client->getProcedureCompiler()->disableCache(); //enableCache(), clearCache();

$client->setProcedure($fileName); $client->getProcedureLoader()->addLoader($procedureLoader); $request = $client->getMessageFactory()->createRequest(); //for custom scripts. $response = $client->getMessageFactory()->createResponse();

$request->setMethod('GET'); $request->setUrl($url);

try{

$client->send($request, $response);

//echo "\n==== log ==== \n" .$client->getLog() . "\n";

//print_r($response->getConsole()); // Array

echo print_R($response->getHeaders()) ;

echo "status = " . $response->getStatus() . "\n";

echo "content = " . $response->getContent() . "\n" ;

} catch(Exception $e){

echo "Error catch\n";

echo $e->getMessage();

var_dump($client->getLog()); //print_r($e->getErrors());

}

/*** timer end ***/ $stop = round(microtime(true) - $start, 5);

echo "time: {$stop}\n";

?> ` Here is the testing1.proc

`[% autoescape false %]

var page = require('webpage').create(); var url = '{{ input.getUrl() }}';

page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36';

page.onInitialized = function() { page.evaluate(function() { delete window._phantom; delete window.callPhantom; }); };

page.open(url, function(status) {

if (status !== 'success') {

console.log('Unable to access network');

} else {

var ua = page.evaluate(function() {

  	return document.getElementById('siteTable').innerHTML;

});

//console.log(ua);

var response = {content:null}; response.content = ua console.log(JSON.stringify(response));

}

phantom.exit();

}); [% endautoescape %]`

Ok, hope this helps.

Cheers

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jonnnnyw/php-phantomjs/issues/174#issuecomment-299685248, or mute the thread https://github.com/notifications/unsubscribe-auth/AL3kJngTFY0T7yZaZ-BTzdiGAukuDTEFks5r3WOMgaJpZM4M980Z .

Sarfroz avatar May 07 '17 10:05 Sarfroz

@yipwt79 run your php and testing1.proc,result:

Array ( ) 1status = 0 content = string(0) "" time: 2.886 

amhoho avatar Sep 06 '17 02:09 amhoho

I tried php-phantom js and I have not enabled debug but still it freezes at some sites , any help ? I dont have custom scripts just default php-phantomjs

gpgr888 avatar Apr 01 '21 11:04 gpgr888