lscache-opencart
lscache-opencart copied to clipboard
Exceed php memory limit by crawler on huge number of products
in Opencart by default are THREE path to product page:
- only product_id path: /index.php?route=product/product**&product_id**=41
- by category_id (categoy path) /index.php?route=product/product**&path=20_27**&product_id=41
- by manufacturer_id : /index.php?route=product/product**&manufacturer_id**=8&product_id=41
crawler algorithm contain 1 and 2, path 3 (by manufacturer_id) forgotten!
on huge number of products, for example, more than 6000, array urls() exceed php memory limit and crawler stop!
that's why I replace in
catalog/controller/extension/module/lscache.php
echo 'recache product urls...' . ($cli ? '' : '<br>') . PHP_EOL;
foreach ($this->model_catalog_product->getProducts() as $result) {
foreach ($this->model_catalog_product->getCategories($result['product_id']) as $category) {
if (isset($categoryPath[$category['category_id']])) {
$urls[] = $this->url->link('product/product', 'path=' . $categoryPath[$category['category_id']] . '&product_id=' . $result['product_id']);
}
}
$urls[] = $this->url->link('product/product', 'product_id=' . $result['product_id']);
}
$this->crawlUrls($urls, $cli);
by this:
echo 'recache product urls...' . ($cli ? '' : '<br>') . PHP_EOL;
$UrlsCount = 0;
$UrlsCountCount = 0;
$this->load->model('catalog/manufacturer');
foreach ($this->model_catalog_product->getProducts() as $result) {
foreach ($this->model_catalog_product->getCategories($result['product_id']) as $category) {
if(isset( $categoryPath[$category['category_id']] )){
$urls[] = $this->url->link('product/product', 'path=' . $categoryPath[$category['category_id']] . '&product_id=' . $result['product_id']);
$UrlsCount++;
}
}
$urls[] = $this->url->link('product/product', 'manufacturer_id=' . $result['manufacturer_id'] . '&product_id=' . $result['product_id']);
$UrlsCount++;
$urls[] = $this->url->link('product/product', 'product_id=' . $result['product_id']);
$UrlsCount++;
if ( $UrlsCount > 4096 ) {
$UrlsCountCount++;
echo 'recache '. $UrlsCountCount . ' part of product urls...' . ($cli ? '' : '<br>') . PHP_EOL;
$this->crawlUrls($urls, $cli);
$urls = array();
$UrlsCount = 0;
}
}
echo 'recache '. $UrlsCountCount . ' part of product urls...' . ($cli ? '' : '<br>') . PHP_EOL;
$this->crawlUrls($urls, $cli);
after some tests in heavy load real conditions I investigate that 4096 urls in array urls() also can exceed php memory limit
problem in $categoryPath that also required more memory.
I decide reduce limit of $UrlsCount to 2048. testing .......
echo 'recache product urls...' . ($cli ? '' : '<br>') . PHP_EOL;
$UrlsCount = 0;
$UrlsCountCount = 0;
$this->load->model('catalog/manufacturer');
foreach ($this->model_catalog_product->getProducts() as $result) {
foreach ($this->model_catalog_product->getCategories($result['product_id']) as $category) {
if(isset( $categoryPath[$category['category_id']] )){
$urls[] = $this->url->link('product/product', 'path=' . $categoryPath[$category['category_id']] . '&product_id=' . $result['product_id']);
$UrlsCount++;
}
}
$urls[] = $this->url->link('product/product', 'manufacturer_id=' . $result['manufacturer_id'] . '&product_id=' . $result['product_id']);
$UrlsCount++;
$urls[] = $this->url->link('product/product', 'product_id=' . $result['product_id']);
$UrlsCount++;
if ( $UrlsCount > 2048 ) {
$UrlsCountCount++;
echo 'recache '. $UrlsCountCount . ' part of product urls...' . ($cli ? '' : '<br>') . PHP_EOL;
$this->crawlUrls($urls, $cli);
$urls = array();
$UrlsCount = 0;
}
}
if ( $UrlsCountCount > 0 ) {
echo 'recache '. $UrlsCountCount . ' part of product urls...' . ($cli ? '' : '<br>') . PHP_EOL;
}
$this->crawlUrls($urls, $cli);