Kevin Ko
Kevin Ko
@GJ98 could you explain about this? (note this mask implementation was not written by me)
> I have another question, how could we visualise the attention heatmap at the decoder heads, similar to I was planning to implement it, but I didn't do it because...
What do you mean?
This is a simple example of stop texts feature. Let's take a dialogue with GPT model as an example. ```python prompt = """This is a chat between Bot and User....
Thanks for answering :)
nope. I just used `time.time()` module. I was wondering if my experiment was wrong, so I wanted to check any benchmark result.
Thank you for reply. What does "retrieve the result at the end." means? Thanks.
So is there a simple way to benchmark the time spent on the GPU?
nope. training
I was going to use it for GPT2 pretraining. I set it to 2048.