Gaufrette icon indicating copy to clipboard operation
Gaufrette copied to clipboard

Memory problems running finfo::buffer with PHP_CLI on AWS, large files

Open ascaura opened this issue 8 years ago • 6 comments

Moving large files from Amazon AWS to S3 using a CakePHP shell, the burzum/cakephp-file-storage plugin and knplabs/Gaufrette we ran into memory problems. The problems appear to be specific to AWS, working with the command-line PHP interpreter and using finfo::buffer on large files.

We get the following messages:

2017-08-10 12:11:57 Warning: Warning (2): finfo::buffer(): Failed identify data 12:cannot allocate 2057250392 bytes (Cannot allocate memory)video/mp4 in [/data/repos/Platform/vendor/knplabs/gaufrette/src/Gaufrette/Adapter/AwsS3.php, line 359]
Trace:
Cake\Error\BaseErrorHandler::handleError() - CORE/src/Error/BaseErrorHandler.php, line 153
finfo::buffer() - [internal], line ??
Gaufrette\Adapter\AwsS3::guessContentType() - ROOT/vendor/knplabs/gaufrette/src/Gaufrette/Adapter/AwsS3.php, line 359
Gaufrette\Adapter\AwsS3::write() - ROOT/vendor/knplabs/gaufrette/src/Gaufrette/Adapter/AwsS3.php, line 166
Gaufrette\Filesystem::write() - ROOT/vendor/knplabs/gaufrette/src/Gaufrette/Filesystem.php, line 103
Burzum\FileStorage\Storage\Listener\AbstractListener::_storeFile() - ROOT/vendor/burzum/cakephp-file-storage/src/Storage/Listener/AbstractListener.php, line 313
Burzum\FileStorage\Storage\Listener\BaseListener::afterSave() - ROOT/vendor/burzum/cakephp-file-storage/src/Storage/Listener/BaseListener.php, line 101
Cake\Event\EventManager::_callListener() - CORE/src/Event/EventManager.php, line 414
Cake\Event\EventManager::dispatch() - CORE/src/Event/EventManager.php, line 391
Cake\ORM\Table::dispatchEvent() - CORE/src/Event/EventDispatcherTrait.php, line 78
Burzum\FileStorage\Model\Table\FileStorageTable::dispatchEvent() - ROOT/vendor/burzum/cakephp-file-storage/src/Model/Table/FileStorageTable.php, line 243
Burzum\FileStorage\Model\Table\FileStorageTable::afterSave() - ROOT/vendor/burzum/cakephp-file-storage/src/Model/Table/FileStorageTable.php, line 159
Cake\Event\EventManager::_callListener() - CORE/src/Event/EventManager.php, line 414
Cake\Event\EventManager::dispatch() - CORE/src/Event/EventManager.php, line 391
Cake\ORM\Table::dispatchEvent() - CORE/src/Event/EventDispatcherTrait.php, line 78
Burzum\FileStorage\Model\Table\FileStorageTable::dispatchEvent() - ROOT/vendor/burzum/cakephp-file-storage/src/Model/Table/FileStorageTable.php, line 243
Cake\ORM\Table::_onSaveSuccess() - CORE/src/ORM/Table.php, line 1850
Cake\ORM\Table::_processSave() - CORE/src/ORM/Table.php, line 1816
Cake\ORM\Table::Cake\ORM\{closure}() - CORE/src/ORM/Table.php, line 1723
Cake\ORM\Table::Cake\ORM\{closure}() - CORE/src/ORM/Table.php, line 1446
Cake\Database\Connection::transactional() - CORE/src/Database/Connection.php, line 680
Cake\ORM\Table::_executeTransaction() - CORE/src/ORM/Table.php, line 1447
Cake\ORM\Table::save() - CORE/src/ORM/Table.php, line 1724
Organizations\Model\Entity\Media::storeFile() - ROOT/plugins/Organizations/src/Model/Entity/Media.php, line 174
App\Shell\MigrationFileStorageShell::main() - APP/Shell/MigrationFileStorageShell.php, line 126
Cake\Console\Shell::runCommand() - CORE/src/Console/Shell.php, line 472
Cake\Console\ShellDispatcher::_dispatch() - CORE/src/Console/ShellDispatcher.php, line 230
Cake\Console\ShellDispatcher::dispatch() - CORE/src/Console/ShellDispatcher.php, line 182
Cake\Console\ShellDispatcher::run() - CORE/src/Console/ShellDispatcher.php, line 128
[main] - ROOT/bin/cake.php, line 20

2017-08-10 12:11:57 Error: [RuntimeException] Could not write the "filename.m4v" key content. in /data/repos/Platform/vendor/knplabs/gaufrette/src/Gaufrette/Filesystem.php on line 106
Stack Trace:
#0 /data/repos/Platform/vendor/burzum/cakephp-file-storage/src/Storage/Listener/AbstractListener.php(313): Gaufrette\Filesystem->write('filename...', '\x00\x00\x00 ftypM4V \x00\x00\x00...', true)
#1 /data/repos/Platform/vendor/burzum/cakephp-file-storage/src/Storage/Listener/BaseListener.php(101): Burzum\FileStorage\Storage\Listener\AbstractListener->_storeFile(Object(Cake\Event\Event))
#2 /data/repos/Platform/vendor/cakephp/cakephp/src/Event/EventManager.php(414): Burzum\FileStorage\Storage\Listener\BaseListener->afterSave(Object(Cake\Event\Event), Object(App\Model\Entity\FileStorage), true, Object(Gaufrette\Filesystem), Object(App\Model\Table\FileStorageTable))
#3 /data/repos/Platform/vendor/cakephp/cakephp/src/Event/EventManager.php(391): Cake\Event\EventManager->_callListener(Array, Object(Cake\Event\Event))
#4 /data/repos/Platform/vendor/cakephp/cakephp/src/Event/EventDispatcherTrait.php(78): Cake\Event\EventManager->dispatch(Object(Cake\Event\Event))
#5 /data/repos/Platform/vendor/burzum/cakephp-file-storage/src/Model/Table/FileStorageTable.php(243): Cake\ORM\Table->dispatchEvent('FileStorage.aft...', Array, NULL)
#6 /data/repos/Platform/vendor/burzum/cakephp-file-storage/src/Model/Table/FileStorageTable.php(159): Burzum\FileStorage\Model\Table\FileStorageTable->dispatchEvent('FileStorage.aft...', Array)
#7 /data/repos/Platform/vendor/cakephp/cakephp/src/Event/EventManager.php(414): Burzum\FileStorage\Model\Table\FileStorageTable->afterSave(Object(Cake\Event\Event), Object(App\Model\Entity\FileStorage), Object(ArrayObject), Object(App\Model\Table\FileStorageTable))
#8 /data/repos/Platform/vendor/cakephp/cakephp/src/Event/EventManager.php(391): Cake\Event\EventManager->_callListener(Array, Object(Cake\Event\Event))
#9 /data/repos/Platform/vendor/cakephp/cakephp/src/Event/EventDispatcherTrait.php(78): Cake\Event\EventManager->dispatch(Object(Cake\Event\Event))
#10 /data/repos/Platform/vendor/burzum/cakephp-file-storage/src/Model/Table/FileStorageTable.php(243): Cake\ORM\Table->dispatchEvent('Model.afterSave', Array, NULL)
#11 /data/repos/Platform/vendor/cakephp/cakephp/src/ORM/Table.php(1850): Burzum\FileStorage\Model\Table\FileStorageTable->dispatchEvent('Model.afterSave', Array)
#12 /data/repos/Platform/vendor/cakephp/cakephp/src/ORM/Table.php(1816): Cake\ORM\Table->_onSaveSuccess(Object(App\Model\Entity\FileStorage), Object(ArrayObject))
#13 /data/repos/Platform/vendor/cakephp/cakephp/src/ORM/Table.php(1723): Cake\ORM\Table->_processSave(Object(App\Model\Entity\FileStorage), Object(ArrayObject))
#14 /data/repos/Platform/vendor/cakephp/cakephp/src/ORM/Table.php(1446): Cake\ORM\Table->Cake\ORM\{closure}()
#15 /data/repos/Platform/vendor/cakephp/cakephp/src/Database/Connection.php(680): Cake\ORM\Table->Cake\ORM\{closure}(Object(Cake\Database\Connection))
#16 /data/repos/Platform/vendor/cakephp/cakephp/src/ORM/Table.php(1447): Cake\Database\Connection->transactional(Object(Closure))
#17 /data/repos/Platform/vendor/cakephp/cakephp/src/ORM/Table.php(1724): Cake\ORM\Table->_executeTransaction(Object(Closure), true)
#18 /data/repos/Platform/plugins/Organizations/src/Model/Entity/Media.php(174): Cake\ORM\Table->save(Object(App\Model\Entity\FileStorage))
#19 /data/repos/Platform/src/Shell/MigrationFileStorageShell.php(126): Organizations\Model\Entity\Media->storeFile(Array, Array)
#20 /data/repos/Platform/vendor/cakephp/cakephp/src/Console/Shell.php(472): App\Shell\MigrationFileStorageShell->main()
#21 /data/repos/Platform/vendor/cakephp/cakephp/src/Console/ShellDispatcher.php(230): Cake\Console\Shell->runCommand(Array, true, Array)
#22 /data/repos/Platform/vendor/cakephp/cakephp/src/Console/ShellDispatcher.php(182): Cake\Console\ShellDispatcher->_dispatch(Array)
#23 /data/repos/Platform/vendor/cakephp/cakephp/src/Console/ShellDispatcher.php(128): Cake\Console\ShellDispatcher->dispatch(Array)
#24 /data/repos/Platform/bin/cake.php(20): Cake\Console\ShellDispatcher::run(Array)
#25 {main}

We were able to reproduce the first warning, which we think is at the core of this issue, with the following PHP script:

<?php

// filename.m4v is a valid video file of around 311MB
$content = file_get_contents('filename.m4v', true);

$fileInfo = new \finfo(FILEINFO_MIME_TYPE);

var_dump($fileInfo->buffer($content));

?>

Serving this script through Apache/PHP-FPM doesn't cause any problems. Neither running this with PHP-CLI on other systems. But running it with PHP-CLI on AWS yields the same warning. The filename.m4v file is 311M. We (temporarily) configured PHP-CLI memory_limit to 3072M. On another server (non AWS) with a memory_limit of 1024M, we do not see this issue. We suspect it has something to do with the way the AWS filesystem or memory management is set up. Note that according to the above warning, PHP tried to allocate 2GB over the allowed 3GB to parse a 311MB file.

We managed to resolve this issue by cutting the input string short to 1024 characters. It appears that finfo:buffer continues to work for most/all files even when they're truncated this way?

<?php

// filename.m4v is a valid video file of around 311MB
$content = file_get_contents('filename.m4v', true); 

$fileInfo = new \finfo(FILEINFO_MIME_TYPE);

$content = substr($content, 0, 1024);

var_dump($fileInfo->buffer($content));

?>

ascaura avatar Aug 11 '17 16:08 ascaura

It turns out that, when truncating at 1024, may cause pptx files to be detected as zip. We found at least one case. Experimentally truncating at 10000 caused it to be correctly detected as "application/vnd.openxmlformats-officedocument.presentationml.presentation" again.

ascaura avatar Sep 25 '17 12:09 ascaura

Have you dealt with problem of large files in Gaufrette?

anonim1133 avatar Mar 13 '18 11:03 anonim1133

Turns out this is indeed a specific AWS issue, increasing the memory limit in PHP will increase the file size u can handle. However increasing the memory limit is something i usualy avoid so we found a workaround to this. If you open a stream instead of the content and then use $fileInfo->file(stream_get_meta_data($content)['uri']) instead does work and yields the same results. All the Gaufrette adapters that we used can handle the stream and will proxy it to the $fileInfo->file(stream_get_meta_data($content)['uri']) instead of $fileInfo->buffer($content). Fixing this issue entirely for our use case. I still am not a fan of code breaking because of server quarks so i will follow this further with AWS support.

The AWS support agent acknowledged the problem and redirected the issue to people that are more capable of resolving this. I will post an update as soon as i get a response.

boraneksen avatar Dec 10 '19 15:12 boraneksen

@boraneksen have you received any further info from AWS support regarding this issue?

mkveksas avatar Apr 06 '20 19:04 mkveksas

I also ran into this issue today. Have you heard anything from AWS, @boraneksen?

Stadly avatar Jun 11 '20 08:06 Stadly

On further investigation, the issue does not seem to be restricted to CLI or AWS: https://github.com/thephpleague/flysystem/issues/1172

Stadly avatar Jun 12 '20 07:06 Stadly