fastmoe
fastmoe copied to clipboard
how to run transformer-xl with parallel experts with single gpu?
seems fast-moe still cannot archive running multi experts in parallel with single gpu card?