Unique type definitions can consume all memory
At the time this issue is being opened Zed is at commit 1ec7052.
@mattnibs recently pointed out that the kinds of changes in #4555 make it particularly easy to make Zed consume all available memory by creating lots of unique type definitions. For an easy repro, consider the following script.
$ cat manytypes.py
#!/usr/bin/env python3
num=1
while True:
print('{"' + str(num) + '": ' + str(num) + '}')
num += 1
Run on an AWS t2.xlarge (16 GB of memory), it gets past 19-million values before all memory is consumed and then the system hangs.
$ ./manytypes.sh | zq -z -
...
{"19198499":19198499}
{"19198500":19198500}
{"19198501":19198501}
We can certainly document this as a known limitation to encourage users to structure their data in ways that won't bump into this (e.g., use a Zed "map" type). However, this kind of data is legal in formats like JSON, and I think Zed currently needs to be able to read such JSON data in full to turn it into a map. Also,jq don't have this same limitation (in a test I observed its memory usage at a flat 872 KB to reach this same point), which is unsurprising given its approach to "stateless dataflow". Therefore it might be worth finding a way to tolerate this kind of input and/or fail more gracefully.