lambeq icon indicating copy to clipboard operation
lambeq copied to clipboard

spiders_reader on long sentences crashes

Open kinianlo opened this issue 5 months ago • 3 comments

Symptom

The use of spiders_reader on long sentences leads to excessive time and memory use

example The following example uses spiders_reader on a sentence with 14 words, each given a 4-dimensional vector. The spider computes the element wise product between these vectors. On a modern computer this should be done in milliseconds. Instead, the following took more than 2 seconds.

from lambeq import spiders_reader, TensorAnsatz, AtomicType
from lambeq import PytorchModel
from lambeq.backend.tensor import Dim
from time import time

sent = ' '.join(str(i) for i in range(14))
diag = spiders_reader.sentence2diagram(sent)

ansatz = TensorAnsatz({AtomicType.SENTENCE: Dim(4)})
circ = ansatz(diag)

model = PytorchModel.from_diagrams([circ])
model.initialise_weights()

start = time()
model.get_diagram_output([circ])
end = time()
print(end - start) # 2.4105310440063477

It crashes (due to memory) if I increase the dimension from 4 to 5 on my machine.

This is my guess of what's going on: the spider with (14+1) legs got converted to a rank-15 CopyNode in tensornetwork whose data is a dense array of size 2^15 during .to_tn().

Perhaps there should a safeguard which splits the large CopyNode into smaller ones. Similar to this in discopy: https://github.com/discopy/discopy/blob/006c3966a1906edfbd4b1ac1bfe943e1a709c0e0/discopy/frobenius.py#L380.

kinianlo avatar Sep 27 '24 02:09 kinianlo