Henry Mao

Results 57 comments of Henry Mao

Try initializing the embedding matrix to uniform distribution drawn from +- `1 / d`.

@sooheon It depends on the particular implementation of your Transformer. Some implementations (Huggingface) scale the embedding by 1 / d before padding it into higher layers while initializing the embedding...

Yes, it would seem reasonable to not decay resweights since other parameters are already being decayed.

@rom1504 @tmbdev I'm running into a similar issue with `gsutil cat` or `gsutil cp`. During mid training for large shards 10GB+ per shard, some network errors occur which the data...

@tmbdev For shards of size 10GB, this would require waiting for the entire file to download prior to loading the data? I guess that's a tradeoff but it will avoid...

@tmbdev Thanks for the info - I'm running this on a server outside of GCloud (but the region is nearby). Downloading the entire object works well - the issue happens...

@RX14 I don't remember owning the domains?

I added a CNAME ci.novaapi.net -> current.rx14.co.uk

If someone else wishes to lead the project, feel free to take over and set up all dependencies.