lambeq
lambeq copied to clipboard
spiders_reader on long sentences crashes
Symptom
The use of spiders_reader
on long sentences leads to excessive time and memory use
example
The following example uses spiders_reader
on a sentence with 14 words, each given a 4-dimensional vector. The spider computes the element wise product between these vectors. On a modern computer this should be done in milliseconds. Instead, the following took more than 2 seconds.
from lambeq import spiders_reader, TensorAnsatz, AtomicType
from lambeq import PytorchModel
from lambeq.backend.tensor import Dim
from time import time
sent = ' '.join(str(i) for i in range(14))
diag = spiders_reader.sentence2diagram(sent)
ansatz = TensorAnsatz({AtomicType.SENTENCE: Dim(4)})
circ = ansatz(diag)
model = PytorchModel.from_diagrams([circ])
model.initialise_weights()
start = time()
model.get_diagram_output([circ])
end = time()
print(end - start) # 2.4105310440063477
It crashes (due to memory) if I increase the dimension from 4 to 5 on my machine.
This is my guess of what's going on:
the spider with (14+1) legs got converted to a rank-15 CopyNode
in tensornetwork
whose data is a dense array of size 2^15 during .to_tn()
.
Perhaps there should a safeguard which splits the large CopyNode
into smaller ones. Similar to this in discopy:
https://github.com/discopy/discopy/blob/006c3966a1906edfbd4b1ac1bfe943e1a709c0e0/discopy/frobenius.py#L380.