Michael J Tanana

Results 28 comments of Michael J Tanana

This is equivalent to the implementation before adding the language model. If you wanted to replicate the paper, you would spit out the top 1000 beams from the DS model...

1. This should be easier than other problems because you can just take the outcome probabilities and walk through them saving the top 1000 at each step. (This will be...

And note...the paper mentioned some weighting between the score from the DS model and the score from the LM....wasn't clear if this was estimated or set like a hyper-param

I'm still playing with the base model code, but once I get better results, I'd be happy to help with this part...but I'm a month or two from where I'll...

If it was a pre-trained GPU model, you'd just have to de-cuda the model on a platform with a GPU (otherwise it won't load) and then re-save it... it's a...

Does the 1080 have 6GB? I'm not sure if that will be able to fit the full model. If you look back at my responses to the thread on running...

#71 There's a comment from me near the bottom that helps with memory..

Don't forget to downsize the minibatch for testing too

I had this issue as well. Is it confirmed that the memory leak happens on the CPU as well? I remember having some memory leaking for this project in CUDA...

CollectGarbage isn't doing the trick...I remember this was the case with my bug as well..I'll keep looking