ClusterRunner
ClusterRunner copied to clipboard
ClusterRunner slave died with MemoryError
Command: export ATOM_ID="0"; export PROJECT_DIR="/tmp/clusterrunner_build_symlinks/80313cfa-576c-430a-b92c-16597aed1619"; export BUILD_EXECUTOR_INDEX="138"; export EXECUTOR_INDEX="8"; export ARTIFACT_DIR="/home/jenkins/.clusterrunner/artifacts/1516/artifact_27_0"; export MACHINE_EXECUTOR_INDEX="8"; export TESTPATH="$PROJECT_DIR/test/php/integration/modular/file-system/operation/legacy/src/Services/File_System/FolderOperation/Iterator/ItemsForUserCrawlerTest.php"; cd $PROJECT_DIR && PHPUNIT_THREAD_INDEX=$EXECUTOR_INDEX $PROJECT_DIR/ci_phpunit --log-junit $ARTIFACT_DIR/result.xml $TESTPATH && test -f $ARTIFACT_DIR/result.xml && xmllint --noout $ARTIFACT_DIR/result.xml
Exit code: 255
Console output: Mon May 22 11:22:04 PDT 2017
timeout 3600 vendor/phpunit/phpunit/phpunit --log-junit /home/jenkins/.clusterrunner/artifacts/1516/artifact_27_0/result.xml /tmp/clusterrunner_build_symlinks/80313cfa-576c-430a-b92c-16597aed1619/test/php/integration/modular/file-system/operation/legacy/src/Services/File... (total output length: 4374603)
[2017-05-22 11:26:26.799] 20917 INFO Bld1516-Sub27 cluster_slave Build 1516, Subjob 27 completed and sent results to master.
[2017-05-22 12:25:00.961] 20917 ERROR Bld1516-Sub329 unhandled_excep Unhandled exception handler caught exception.
Traceback (most recent call last):
File "/home/jenkins/ClusterRunnerBuild/app/util/safe_thread.py", line 18, in run
File "/usr/local/lib/python3.4/threading.py", line 868, in run
File "/home/jenkins/ClusterRunnerBuild/app/slave/cluster_slave.py", line 302, in _execute_subjob
File "/home/jenkins/ClusterRunnerBuild/app/slave/subjob_executor.py", line 100, in execute_subjob
File "/home/jenkins/ClusterRunnerBuild/app/slave/subjob_executor.py", line 144, in _execute_atom_command
File "/home/jenkins/ClusterRunnerBuild/app/project_type/git.py", line 245, in execute_command_in_project
File "/home/jenkins/ClusterRunnerBuild/app/project_type/project_type.py", line 231, in execute_command_in_project
File "/home/jenkins/ClusterRunnerBuild/app/project_type/project_type.py", line 316, in _read_file_contents_and_close
MemoryError
[2017-05-22 12:25:01.134] 20917 DEBUG Bld1516-Sub329 unhandled_excep Executing teardown callback: <bound method ClusterSlave._disconnect_from_master of <app.slave.cluster_slave.ClusterSlave object at 0x7fcdef0f7b00>>
[2017-05-22 12:25:01.173] 20917 INFO Bld1516-Sub329 cluster_slave Notifying master that this slave is disconnecting.
[2017-05-22 12:25:01.202] 20917 DEBUG Bld1516-Sub329 unhandled_excep Executing teardown callback: <bound method ClusterSlave._do_build_teardown_and_reset of <app.slave.cluster_slave.ClusterSlave object at 0x7fcdef0f7b00>>
[2017-05-22 12:25:01.202] 20917 INFO Bld1516-Sub329 cluster_slave Executing teardown for build 1516.
[2017-05-22 12:25:01.341] 20917 DEBUG Bld1516-Sub329 git Executing command in project: export PROJECT_DIR="/tmp/clusterrunner_build_symlinks/80313cfa-576c-430a-b92c-16597aed1619"; sudo rm -rf /box/var/log/phpunit
[2017-05-22 12:25:01.711] 20917 DEBUG Bld1516-Sub329 git Command completed with exit code 0.
[2017-05-22 12:25:01.711] 20917 INFO Bld1516-Sub329 git Build teardown completed successfully.
[2017-05-22 12:25:01.712] 20917 INFO Bld1516-Sub329 git ProjectType teardown complete.
[2017-05-22 12:25:01.712] 20917 INFO Bld1516-Sub329 cluster_slave Build teardown complete for build 1516.
[2017-05-22 12:25:01.713] 20917 DEBUG Bld1516-Sub329 unhandled_excep Executing teardown callback: <function ServiceSubcommand._write_pid_file.<locals>.remove_pid_file at 0x7fcdee2488c8>
[2017-05-22 12:25:01.739] 20917 DEBUG Bld1516-Sub329 unhandled_excep Executing teardown callback: functools.partial(<bound method EPollIOLoop.add_callback of <tornado.platform.epoll.EPollIOLoop object at 0x7fcdeec887b8>>, callback=<bound method EPollIOLoop.stop of <tornado.platform.epoll.EPollIOLoop object at 0x7fcdeec887b8>>)
[2017-05-22 12:25:01.813] 20917 NOTICE SlaveTornadoThr subcommand Slave server was stopped.
Ah, we do something silly here where we read the entire console output contents of an atom run into memory.
https://github.com/box/ClusterRunner/blob/master/app/project_type/project_type.py#L216
console_output = self._read_file_contents_and_close(output_file)
I haven't looked too close yet, but we should probably tail the log contents up to some length instead.
Good point. We could do something similar to what we do for console output.