Kevin Ko

Results 72 comments of Kevin Ko

@GJ98 could you explain about this? (note this mask implementation was not written by me)

> I have another question, how could we visualise the attention heatmap at the decoder heads, similar to I was planning to implement it, but I didn't do it because...

This is a simple example of stop texts feature. Let's take a dialogue with GPT model as an example. ```python prompt = """This is a chat between Bot and User....

nope. I just used `time.time()` module. I was wondering if my experiment was wrong, so I wanted to check any benchmark result.

Thank you for reply. What does "retrieve the result at the end." means? Thanks.

So is there a simple way to benchmark the time spent on the GPU?

I was going to use it for GPT2 pretraining. I set it to 2048.