libzmq icon indicating copy to clipboard operation
libzmq copied to clipboard

Can't send message with serialized NumPy array that is larger than 2 GB in size

Open wigging opened this issue 9 months ago • 2 comments

I'm using pyzmq to send a large NumPy array from the client to the server. See the pyzmq discussion for more details. The client and server computers are both MacBook Pro laptops running macOS 15.3 with 32 GB of memory. I noticed that if the NumPy array is larger than 2 GB in size then it fails to send. Since pyzmq does not set a size limit, does libzmq impose a size limit on the TCP socket messages?

I have found another issue with a similar problem. But that issue seems abandoned and doesn't specifically deal with NumPy arrays.

  • https://github.com/zeromq/libzmq/issues/4135

Here is my Python code for serializing the NumPy array and sending it. This works fine as long as the NumPy array is less than 2 GB in size.

# client.py

import sys
import numpy as np
import zmq

class Client:
    """Client for sending/receiving messages."""

    def __init__(self, address="tcp://localhost:5555"):
        context = zmq.Context()
        socket = context.socket(zmq.REQ)
        socket.connect(address)
        self.socket = socket

    def send_array(self, array: np.ndarray):
        md = {"dtype": str(array.dtype), "shape": array.shape}
        self.socket.send_json(md, zmq.SNDMORE)  # send metadata
        self.socket.send(array, copy=False)     # send NumPy array data

    def recv_message(self):
        reply = self.socket.recv_string()
        print("Received reply:", reply)

def main():
    # Create array
    n = 16000  # 8000 is 500 MB, 11500 is 1 GB, 16000 is 2 GB, 17000 fails to send
    x = np.random.rand(n, n)
    print(f"Array shape:           {x.shape}")
    print(f"First three elements:  {x[0, 0:3]}")
    print(f"Size of array data:    {x.nbytes} bytes, {x.nbytes / 1000**2} MB")
    print(f"Size of array object:  {sys.getsizeof(x)} bytes, {x.nbytes / 1000**2} MB")

    # Create client and send array
    client = Client()
    client.send_array(x)
    client.recv_message()

if __name__ == "__main__":
    main()
# server.py

from typing import Any
import zmq
import numpy as np

class Server:
    """Server for receiving/sending messages."""

    def __init__(self, address="tcp://localhost:5555"):
        context = zmq.Context()
        socket = context.socket(zmq.REP)
        socket.bind(address)
        self.socket = socket
        print("Server started, waiting for array...")

    def _recv_array(self):
        md: Any = self.socket.recv_json()               # receive metadata
        msg: Any = self.socket.recv(copy=False)         # receive NumPy array data
        array = np.frombuffer(msg, dtype=md["dtype"])   # reconstruct the NumPy array
        return array.reshape(md["shape"])

    def run(self):
        """Run the server."""
        while True:
            # Receive the NumPy array
            array = self._recv_array()
            print("Received array with shape:", array.shape)
            print(f"First three elements:  {array[0, 0:3]}")

            # Send a confirmation reply
            self.socket.send_string("Array received")

def main():
    server = Server()
    server.run()

if __name__ == "__main__":
    main()

wigging avatar Feb 11 '25 14:02 wigging

I offer no direct solution, but have you considered splitting the large array somehow and sending the parts successively, i.e. with zmq.SNDMORE?

kitmonisit avatar Apr 04 '25 10:04 kitmonisit

@kitmonisit Yes, splitting the array into smaller chunks is the only solution that I have found. But I still want to understand why the NumPy array must be less than 2 GB in size. Hopefully someone will provide an answer to the size limit.

wigging avatar Apr 08 '25 12:04 wigging

I use pyzmq and send zmq.Frame raw bytes, surprised to see it's truncated after 2GB. I suspect the issue is related to using 'int' as memory offsets, which can at most index $2^{31}$ bytes which is roughly 2GB.

Clouder0 avatar Aug 12 '25 23:08 Clouder0