immunedb icon indicating copy to clipboard operation
immunedb copied to clipboard

MemoryError when identifying large fasta(q) files (>6GB)

Open azuretimm opened this issue 5 years ago • 10 comments

I am trying to run immunedb on a quite large dataset, with fastq file >10GB in a docker container, and memory error was shown after several minutes of waiting. I tried to convert fastq to fasta file to reduce the size (6Gb), the error presists The following is the error message given:

root@d5c47b7d9f26:~# immunedb_identify /share/configs/SRR780749.json /root/germlines/imgt_mouse_ighv.fasta \
> /root/germlines/imgt_mouse_ighj.fasta \
> /share/SRR780749 
2019-04-04 10:33:27 [INFO] Starting sample SRR7807494-CD43-B
2019-04-04 10:33:27 [INFO] Parsing input
2019-04-04 10:40:34 [INFO] There are 35089337 sequences
2019-04-04 10:40:34 [INFO] Generate time: 426.88751745224
Traceback (most recent call last):
  File "/usr/local/bin/immunedb_identify", line 4, in <module>
    __import__('pkg_resources').run_script('ImmuneDB==0.28.2', 'immunedb_identify')
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 658, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1438, in run_script
    exec(code, namespace, namespace)
  File "/usr/local/lib/python3.6/dist-packages/ImmuneDB-0.28.2-py3.6-linux-x86_64.egg/EGG-INFO/scripts/immunedb_identify", line 70, in <module>
    run_identify(session, args)
  File "/usr/local/lib/python3.6/dist-packages/ImmuneDB-0.28.2-py3.6-linux-x86_64.egg/immunedb/identification/identify.py", line 418, in run_identify
    args.nproc
  File "/usr/local/lib/python3.6/dist-packages/ImmuneDB-0.28.2-py3.6-linux-x86_64.egg/immunedb/identification/identify.py", line 301, in process_sample
    generate_args={'path': path},
  File "/usr/local/lib/python3.6/dist-packages/ImmuneDB-0.28.2-py3.6-linux-x86_64.egg/immunedb/util/concurrent.py", line 107, in process_data
    proxy_data = manager.list(input_data)
  File "/usr/lib/python3.6/multiprocessing/managers.py", line 662, in temp
    token, exp = self._create(typeid, *args, **kwds)
  File "/usr/lib/python3.6/multiprocessing/managers.py", line 556, in _create
    id, exposed = dispatch(conn, None, 'create', (typeid,)+args, kwds)
  File "/usr/lib/python3.6/multiprocessing/managers.py", line 78, in dispatch
    c.send((id, methodname, args, kwds))
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
MemoryError

What should I do to make it work? Thank you very much for any help!

azuretimm avatar Apr 04 '19 10:04 azuretimm

How much memory is available on the system? We generally recommend there be at least 3*(size of the largest file) available.

Also, are you running this on a Mac? If so, by default Docker limited to 2GB of memory which should be increased.

arosenfeld avatar Apr 05 '19 13:04 arosenfeld

I am running the docker on Ubuntu 18.04 with a system memory of 32GB, when the script is running the memory usage never exceeded 16GB according to system monitor. If 3*6GB is needed, there is plenty to spare if I was running the 6GB .fasta file. I think by default docker on linux have no limit to memory? I am new to docker so pardon me if this is a dumb question :p

azuretimm avatar Apr 05 '19 13:04 azuretimm

Hm okay. I'll try this and see if I can recreate it. In the meantime, if you run docker stats while the container is running, does it show the maximum memory usage being the total available on your system?

arosenfeld avatar Apr 05 '19 13:04 arosenfeld

Hm okay. I'll try this and see if I can recreate it. In the meantime, if you run docker stats while the container is running, does it show the maximum memory usage being the total available on your system?

I am away form the machine right now, will try to plug two more memory sticks in the system to make it 64GB tomorrow, in the meantime if there is anything else I could do to provide more information to help resolving the issue please let me know. also I assume the docker status command should be run on another host terminal while the docker is running the fasta file? Thank you very much for your help!

azuretimm avatar Apr 05 '19 13:04 azuretimm

Yes, the docker stats file should be run on the host. If the memory issue persists, you could potentially get around it by splitting the file into smaller ones, adding a field to the metadata file like:

file_name sample_name ... combine_name
file_part001.fastq file_part001 ... combined_sample_name
file_part002.fastq file_part002 ... combined_sample_name
file_part003.fastq file_part003 ... combined_sample_name

Then, after running identification, run immunedb_modify CONFIG_PATH combine-samples combine_name which will collapse all the samples into a new one called combined_sample_name.

arosenfeld avatar Apr 05 '19 14:04 arosenfeld

Thank you! I will try the divide and conquer approach if 64GB memory can't do the trick, will report back with results.

azuretimm avatar Apr 05 '19 14:04 azuretimm

I installed 64GB ram in the machine and ran the docker with the .fasta file (6.4GB), the peak memory usage according to docker stats is 50GB / 62.84GB. This time the MemoryError did not appear, but the following error is displayed:

2019-04-06 02:07:06 [INFO] Starting sample SRR7807494-CD43-B 2019-04-06 02:07:06 [INFO] Parsing input 2019-04-06 02:14:08 [INFO] There are 35089337 sequences 2019-04-06 02:14:08 [INFO] Generate time: 421.91839122772217 Traceback (most recent call last): File "/usr/local/bin/immunedb_identify", line 4, in import('pkg_resources').run_script('ImmuneDB==0.28.2', 'immunedb_identify') File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 658, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1438, in run_script exec(code, namespace, namespace) File "/usr/local/lib/python3.6/dist-packages/ImmuneDB-0.28.2-py3.6-linux-x86_64.egg/EGG-INFO/scripts/immunedb_identify", line 70, in run_identify(session, args) File "/usr/local/lib/python3.6/dist-packages/ImmuneDB-0.28.2-py3.6-linux-x86_64.egg/immunedb/identification/identify.py", line 418, in run_identify args.nproc File "/usr/local/lib/python3.6/dist-packages/ImmuneDB-0.28.2-py3.6-linux-x86_64.egg/immunedb/identification/identify.py", line 301, in process_sample generate_args={'path': path}, File "/usr/local/lib/python3.6/dist-packages/ImmuneDB-0.28.2-py3.6-linux-x86_64.egg/immunedb/util/concurrent.py", line 107, in process_data proxy_data = manager.list(input_data) File "/usr/lib/python3.6/multiprocessing/managers.py", line 662, in temp token, exp = self._create(typeid, *args, **kwds) File "/usr/lib/python3.6/multiprocessing/managers.py", line 556, in _create id, exposed = dispatch(conn, None, 'create', (typeid,)+args, kwds) File "/usr/lib/python3.6/multiprocessing/managers.py", line 78, in dispatch c.send((id, methodname, args, kwds)) File "/usr/lib/python3.6/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/usr/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n) struct.error: 'i' format requires -2147483648 <= number <= 2147483647 root@ba20613f46c1:~#

I tried the fastq file (11.5GB), MemoryError persisted:

2019-04-06 02:41:56 [INFO] Starting sample SRR7807494-CD43-B 2019-04-06 02:41:56 [INFO] Parsing input 2019-04-06 03:03:44 [INFO] There are 35089337 sequences 2019-04-06 03:03:44 [INFO] Generate time: 1308.4667782783508 Traceback (most recent call last): File "/usr/local/bin/immunedb_identify", line 4, in import('pkg_resources').run_script('ImmuneDB==0.28.2', 'immunedb_identify') File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 658, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1438, in run_script exec(code, namespace, namespace) File "/usr/local/lib/python3.6/dist-packages/ImmuneDB-0.28.2-py3.6-linux-x86_64.egg/EGG-INFO/scripts/immunedb_identify", line 70, in run_identify(session, args) File "/usr/local/lib/python3.6/dist-packages/ImmuneDB-0.28.2-py3.6-linux-x86_64.egg/immunedb/identification/identify.py", line 418, in run_identify args.nproc File "/usr/local/lib/python3.6/dist-packages/ImmuneDB-0.28.2-py3.6-linux-x86_64.egg/immunedb/identification/identify.py", line 301, in process_sample generate_args={'path': path}, File "/usr/local/lib/python3.6/dist-packages/ImmuneDB-0.28.2-py3.6-linux-x86_64.egg/immunedb/util/concurrent.py", line 107, in process_data proxy_data = manager.list(input_data) File "/usr/lib/python3.6/multiprocessing/managers.py", line 662, in temp token, exp = self._create(typeid, *args, **kwds) File "/usr/lib/python3.6/multiprocessing/managers.py", line 556, in _create id, exposed = dispatch(conn, None, 'create', (typeid,)+args, kwds) File "/usr/lib/python3.6/multiprocessing/managers.py", line 78, in dispatch c.send((id, methodname, args, kwds)) File "/usr/lib/python3.6/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) MemoryError

I then tried divide and conquer approach, splitting the fastq file into 1GB chunks, this time it passed the first stage successfully, with the peak memory usage of ~13GB.

2019-04-06 03:26:12 [INFO] Starting sample SRR7807494-CD43-B.part_001.fastq 2019-04-06 03:26:12 [INFO] Parsing input 2019-04-06 03:28:03 [INFO] There are 2924112 sequences 2019-04-06 03:28:03 [INFO] Generate time: 111.26955652236938 2019-04-06 03:28:16 [INFO] Waiting on pool process_vdj

I am still waiting for it to finish, but so far the divide and conquer approach seems to be working. Many thanks to you for your help. In the meantime, if there is anything else I could provide to help solving the issue, please let me know :D

azuretimm avatar Apr 06 '19 02:04 azuretimm

Another issue appeared while combining the separate parts into one, I am not sure if I should create another issue or continue replying in this thread, but here is the error message:

root@ba20613f46c1:~# immunedb_modify /share/configs/SRR780749.json combine-samples combine_name 2019-04-07 01:13:07 [INFO] Resetting information for 1 subjects 2019-04-07 01:13:07 [INFO] Resetting collapsing 2019-04-07 01:13:07 [INFO] Resetting clones 2019-04-07 01:13:07 [INFO] Resetting sample statistics 2019-04-07 01:13:07 [INFO] Combining 12 samples into new sample "SRR7807494-CD43-B" (ID 2) 2019-04-07 01:13:38 [INFO] Updating sample name and deleting empty samples 2019-04-07 01:13:38 [INFO] Removing duplicates from sample 2 Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1236, in _execute_context cursor, statement, parameters, context File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/engine/default.py", line 536, in do_execute cursor.execute(statement, parameters) File "/usr/local/lib/python3.6/dist-packages/PyMySQL-0.9.3-py3.6.egg/pymysql/cursors.py", line 170, in execute result = self._query(query) File "/usr/local/lib/python3.6/dist-packages/PyMySQL-0.9.3-py3.6.egg/pymysql/cursors.py", line 455, in _query conn.query(q, unbuffered=True) File "/usr/local/lib/python3.6/dist-packages/PyMySQL-0.9.3-py3.6.egg/pymysql/connections.py", line 517, in query self._affected_rows = self._read_query_result(unbuffered=unbuffered) File "/usr/local/lib/python3.6/dist-packages/PyMySQL-0.9.3-py3.6.egg/pymysql/connections.py", line 725, in _read_query_result result.init_unbuffered_query() File "/usr/local/lib/python3.6/dist-packages/PyMySQL-0.9.3-py3.6.egg/pymysql/connections.py", line 1092, in init_unbuffered_query first_packet = self.connection._read_packet() File "/usr/local/lib/python3.6/dist-packages/PyMySQL-0.9.3-py3.6.egg/pymysql/connections.py", line 684, in _read_packet packet.check_error() File "/usr/local/lib/python3.6/dist-packages/PyMySQL-0.9.3-py3.6.egg/pymysql/protocol.py", line 220, in check_error err.raise_mysql_exception(self._data) File "/usr/local/lib/python3.6/dist-packages/PyMySQL-0.9.3-py3.6.egg/pymysql/err.py", line 109, in raise_mysql_exception raise errorclass(errno, errval) pymysql.err.IntegrityError: (1062, "Duplicate entry 'SRR7807494-CD43-B' for key 'name'")

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/bin/immunedb_modify", line 4, in import('pkg_resources').run_script('ImmuneDB==0.28.2', 'immunedb_modify') File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 658, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1438, in run_script exec(code, namespace, namespace) File "/usr/local/lib/python3.6/dist-packages/ImmuneDB-0.28.2-py3.6-linux-x86_64.egg/EGG-INFO/scripts/immunedb_modify", line 23, in }[args.cmd](session, args) File "/usr/local/lib/python3.6/dist-packages/ImmuneDB-0.28.2-py3.6-linux-x86_64.egg/immunedb/common/modify.py", line 132, in combine_samples remove_duplicates(session, final_sample) File "/usr/local/lib/python3.6/dist-packages/ImmuneDB-0.28.2-py3.6-linux-x86_64.egg/immunedb/common/modify.py", line 20, in remove_duplicates for seq in all_seqs: File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/orm/query.py", line 3080, in iter self.session._autoflush() File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/orm/session.py", line 1582, in _autoflush util.raise_from_cause(e) File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/util/compat.py", line 296, in raise_from_cause reraise(type(exception), exception, tb=exc_tb, cause=cause) File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/util/compat.py", line 277, in reraise raise value File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/orm/session.py", line 1571, in _autoflush self.flush() File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/orm/session.py", line 2436, in flush self._flush(objects) File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/orm/session.py", line 2574, in _flush transaction.rollback(_capture_exception=True) File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/util/langhelpers.py", line 67, in exit compat.reraise(exc_type, exc_value, exc_tb) File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/util/compat.py", line 277, in reraise raise value File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/orm/session.py", line 2534, in _flush flush_context.execute() File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/orm/unitofwork.py", line 416, in execute rec.execute(self) File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/orm/unitofwork.py", line 583, in execute uow, File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/orm/persistence.py", line 236, in save_obj update, File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/orm/persistence.py", line 976, in _emit_update_statements statement, multiparams File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 980, in execute return meth(self, multiparams, params) File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/sql/elements.py", line 273, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1099, in _execute_clauseelement distilled_params, File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1240, in _execute_context e, statement, parameters, cursor, context File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1458, in _handle_dbapi_exception util.raise_from_cause(sqlalchemy_exception, exc_info) File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/util/compat.py", line 296, in raise_from_cause reraise(type(exception), exception, tb=exc_tb, cause=cause) File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/util/compat.py", line 276, in reraise raise value.with_traceback(tb) File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1236, in _execute_context cursor, statement, parameters, context File "/usr/local/lib/python3.6/dist-packages/SQLAlchemy-1.2.17-py3.6-linux-x86_64.egg/sqlalchemy/engine/default.py", line 536, in do_execute cursor.execute(statement, parameters) File "/usr/local/lib/python3.6/dist-packages/PyMySQL-0.9.3-py3.6.egg/pymysql/cursors.py", line 170, in execute result = self._query(query) File "/usr/local/lib/python3.6/dist-packages/PyMySQL-0.9.3-py3.6.egg/pymysql/cursors.py", line 455, in _query conn.query(q, unbuffered=True) File "/usr/local/lib/python3.6/dist-packages/PyMySQL-0.9.3-py3.6.egg/pymysql/connections.py", line 517, in query self._affected_rows = self._read_query_result(unbuffered=unbuffered) File "/usr/local/lib/python3.6/dist-packages/PyMySQL-0.9.3-py3.6.egg/pymysql/connections.py", line 725, in _read_query_result result.init_unbuffered_query() File "/usr/local/lib/python3.6/dist-packages/PyMySQL-0.9.3-py3.6.egg/pymysql/connections.py", line 1092, in init_unbuffered_query first_packet = self.connection._read_packet() File "/usr/local/lib/python3.6/dist-packages/PyMySQL-0.9.3-py3.6.egg/pymysql/connections.py", line 684, in _read_packet packet.check_error() File "/usr/local/lib/python3.6/dist-packages/PyMySQL-0.9.3-py3.6.egg/pymysql/protocol.py", line 220, in check_error err.raise_mysql_exception(self._data) File "/usr/local/lib/python3.6/dist-packages/PyMySQL-0.9.3-py3.6.egg/pymysql/err.py", line 109, in raise_mysql_exception raise errorclass(errno, errval) sqlalchemy.exc.IntegrityError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (pymysql.err.IntegrityError) (1062, "Duplicate entry 'SRR7807494-CD43-B' for key 'name'") [SQL: 'UPDATE samples SET name=%(name)s WHERE samples.id = %(samples_id)s'] [parameters: {'name': 'SRR7807494-CD43-B', 'samples_id': 2}] (Background on this error at: http://sqlalche.me/e/gkpj)

Looks like the combined sample name shouldn't be same according to the last error message? Here is my metadata.tsv:

file_name study_name sample_name subject combine_name SRR7807494-CD43-B.part_001.fastq BALBc SRR7807494-CD43-B.part_001.fastq mouse SRR7807494-CD43-B SRR7807494-CD43-B.part_002.fastq BALBc SRR7807494-CD43-B.part_002.fastq mouse SRR7807494-CD43-B SRR7807494-CD43-B.part_003.fastq BALBc SRR7807494-CD43-B.part_003.fastq mouse SRR7807494-CD43-B SRR7807494-CD43-B.part_004.fastq BALBc SRR7807494-CD43-B.part_004.fastq mouse SRR7807494-CD43-B SRR7807494-CD43-B.part_005.fastq BALBc SRR7807494-CD43-B.part_005.fastq mouse SRR7807494-CD43-B SRR7807494-CD43-B.part_006.fastq BALBc SRR7807494-CD43-B.part_006.fastq mouse SRR7807494-CD43-B SRR7807494-CD43-B.part_007.fastq BALBc SRR7807494-CD43-B.part_007.fastq mouse SRR7807494-CD43-B SRR7807494-CD43-B.part_008.fastq BALBc SRR7807494-CD43-B.part_008.fastq mouse SRR7807494-CD43-B SRR7807494-CD43-B.part_009.fastq BALBc SRR7807494-CD43-B.part_009.fastq mouse SRR7807494-CD43-B SRR7807494-CD43-B.part_010.fastq BALBc SRR7807494-CD43-B.part_010.fastq mouse SRR7807494-CD43-B SRR7807494-CD43-B.part_011.fastq BALBc SRR7807494-CD43-B.part_011.fastq mouse SRR7807494-CD43-B SRR7807494-CD43-B.part_012.fastq BALBc SRR7807494-CD43-B.part_012.fastq mouse SRR7807494-CD43-B

Again, thank you very much for your help!

azuretimm avatar Apr 07 '19 01:04 azuretimm

Was this run on a fresh database? This error should mean there is already a sample called "SRR7807494-CD43-B" in the database.

I tried the same metadata sheet, albeit with different data of course, and did not encounter this error.

arosenfeld avatar Apr 08 '19 12:04 arosenfeld

I'm not sure this is a common error because most input data are not this large, so I'm not sure if this will be fixed soon. However, the pertinent error message:

2019-04-06 02:07:06 [INFO] Starting sample SRR7807494-CD43-B
2019-04-06 02:07:06 [INFO] Parsing input
2019-04-06 02:14:08 [INFO] There are 35089337 sequences
2019-04-06 02:14:08 [INFO] Generate time: 421.91839122772217
Traceback (most recent call last):
File "/usr/local/bin/immunedb_identify", line 4, in 
import('pkg_resources').run_script('ImmuneDB==0.28.2', 'immunedb_identify')
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 658, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 1438, in run_script
exec(code, namespace, namespace)
File "/usr/local/lib/python3.6/dist-packages/ImmuneDB-0.28.2-py3.6-linux-x86_64.egg/EGG-INFO/scripts/immunedb_identify", line 70, in 
run_identify(session, args)
File "/usr/local/lib/python3.6/dist-packages/ImmuneDB-0.28.2-py3.6-linux-x86_64.egg/immunedb/identification/identify.py", line 418, in run_identify
args.nproc
File "/usr/local/lib/python3.6/dist-packages/ImmuneDB-0.28.2-py3.6-linux-x86_64.egg/immunedb/identification/identify.py", line 301, in process_sample
generate_args={'path': path},
File "/usr/local/lib/python3.6/dist-packages/ImmuneDB-0.28.2-py3.6-linux-x86_64.egg/immunedb/util/concurrent.py", line 107, in process_data
proxy_data = manager.list(input_data)
... snip ...
struct.error: 'i' format requires -2147483648 <= number <= 2147483647

This appears to happen when the list of sequences to process is too large:

https://github.com/python/cpython/blob/v3.6.7/Lib/multiprocessing/connection.py#L393

arosenfeld avatar Apr 09 '19 18:04 arosenfeld