Results 22 comments of Wizyoung

This issue is not fixed yet.

Just add a cuda stream for device separation may fix this. ```python stream = torch.cuda.Stream() # you can place it into __init__ with torch.cuda.stream(stream): output = self.model.generate(xxx) stream.synchronize() ``` This...