meshed-memory-transformer icon indicating copy to clipboard operation
meshed-memory-transformer copied to clipboard

Q/A visual for coding

Open TrungThanhTran opened this issue 4 years ago • 2 comments

Hi @baraldilorenzo,

I 'm trying to improve the speed of beam_search. When doing it, I found this function: visual = self._expand_visual(visual, cur_beam_size, selected_beam) in the iter function of beam_search.py

Please tell me what does this mean?

T.T.T

TrungThanhTran avatar Mar 26 '20 06:03 TrungThanhTran

@TranTony I found that the function expands visual (ie, repeats the tensor beam_size times) at the first step and for subsequent steps the visual fed as input is the same as the output of the function call.

Example: If I feed in visual as a FloatTensor of size(4, 50, 2048) (b_s, seq_len, d_input) and beam_size=5 than self._expand_visual returns a FloatTensor of size (20, 50, 2048) (b_s * beam_size, seq_len, d_input) at the first step during beam search.

For subsequent steps of beam search, visual of shape (20, 50, 2048) , as expected, is fed as an argument to self._expand_visual and output tensor generated is the same as the input.

old_visual = visual
visual = self._expand_visual(visual, cur_beam_size, selected_beam)
print(torch.equal(old_visual, visual))
>> True

Did I miss out anything? Also, how did you intend to speed up beam search?

svp19 avatar Sep 14 '20 15:09 svp19

I reduce the beam_size to 1 or 2 and I found out that it achieves the same result. However, I applied to an auto annotation problem which generates about 50 words at a time. I don't think you need to reduce it. Plus, I reduce the number of connections and layers of encoder and decoder, too. About the visual, yes, its outcome is the same as your output.

TrungThanhTran avatar Sep 14 '20 16:09 TrungThanhTran