libzmq
libzmq copied to clipboard
Can't send message with serialized NumPy array that is larger than 2 GB in size
I'm using pyzmq to send a large NumPy array from the client to the server. See the pyzmq discussion for more details. The client and server computers are both MacBook Pro laptops running macOS 15.3 with 32 GB of memory. I noticed that if the NumPy array is larger than 2 GB in size then it fails to send. Since pyzmq does not set a size limit, does libzmq impose a size limit on the TCP socket messages?
I have found another issue with a similar problem. But that issue seems abandoned and doesn't specifically deal with NumPy arrays.
- https://github.com/zeromq/libzmq/issues/4135
Here is my Python code for serializing the NumPy array and sending it. This works fine as long as the NumPy array is less than 2 GB in size.
# client.py
import sys
import numpy as np
import zmq
class Client:
"""Client for sending/receiving messages."""
def __init__(self, address="tcp://localhost:5555"):
context = zmq.Context()
socket = context.socket(zmq.REQ)
socket.connect(address)
self.socket = socket
def send_array(self, array: np.ndarray):
md = {"dtype": str(array.dtype), "shape": array.shape}
self.socket.send_json(md, zmq.SNDMORE) # send metadata
self.socket.send(array, copy=False) # send NumPy array data
def recv_message(self):
reply = self.socket.recv_string()
print("Received reply:", reply)
def main():
# Create array
n = 16000 # 8000 is 500 MB, 11500 is 1 GB, 16000 is 2 GB, 17000 fails to send
x = np.random.rand(n, n)
print(f"Array shape: {x.shape}")
print(f"First three elements: {x[0, 0:3]}")
print(f"Size of array data: {x.nbytes} bytes, {x.nbytes / 1000**2} MB")
print(f"Size of array object: {sys.getsizeof(x)} bytes, {x.nbytes / 1000**2} MB")
# Create client and send array
client = Client()
client.send_array(x)
client.recv_message()
if __name__ == "__main__":
main()
# server.py
from typing import Any
import zmq
import numpy as np
class Server:
"""Server for receiving/sending messages."""
def __init__(self, address="tcp://localhost:5555"):
context = zmq.Context()
socket = context.socket(zmq.REP)
socket.bind(address)
self.socket = socket
print("Server started, waiting for array...")
def _recv_array(self):
md: Any = self.socket.recv_json() # receive metadata
msg: Any = self.socket.recv(copy=False) # receive NumPy array data
array = np.frombuffer(msg, dtype=md["dtype"]) # reconstruct the NumPy array
return array.reshape(md["shape"])
def run(self):
"""Run the server."""
while True:
# Receive the NumPy array
array = self._recv_array()
print("Received array with shape:", array.shape)
print(f"First three elements: {array[0, 0:3]}")
# Send a confirmation reply
self.socket.send_string("Array received")
def main():
server = Server()
server.run()
if __name__ == "__main__":
main()
I offer no direct solution, but have you considered splitting the large array somehow and sending the parts successively, i.e. with zmq.SNDMORE?
@kitmonisit Yes, splitting the array into smaller chunks is the only solution that I have found. But I still want to understand why the NumPy array must be less than 2 GB in size. Hopefully someone will provide an answer to the size limit.
I use pyzmq and send zmq.Frame raw bytes, surprised to see it's truncated after 2GB. I suspect the issue is related to using 'int' as memory offsets, which can at most index $2^{31}$ bytes which is roughly 2GB.