maxtext icon indicating copy to clipboard operation
maxtext copied to clipboard

Reduce disk usage for mixtral tests

Open rdyro opened this issue 1 year ago • 4 comments

This PR modifies the parameter conversion mixtral tests to go through gcsfuse instead of disk for lower VM disk usage

rdyro avatar Oct 07 '24 17:10 rdyro

Also just a high-level question - @rdyro did you notice any slowdown using gcsfuse compared to a local copy of the weights?

jonb377 avatar Oct 07 '24 22:10 jonb377

Also just a high-level question - @rdyro did you notice any slowdown using gcsfuse compared to a local copy of the weights?

I did not check quantitatively, but qualitatively, there wasn't much of a difference. The loading of the existing checkpoint to be converted takes less than 10% of the total run, even after moving to gcsfuse.

If it's important we measure this (e.g., for elsewhere), let me know

rdyro avatar Oct 07 '24 22:10 rdyro

Also just a high-level question - @rdyro did you notice any slowdown using gcsfuse compared to a local copy of the weights?

I did not check quantitatively, but qualitatively, there wasn't much of a difference. The loading of the existing checkpoint to be converted takes less than 10% of the total run, even after moving to gcsfuse.

If it's important we measure this (e.g., for elsewhere), let me know

Correction: the new strategy via gcsfuse is equivalent to single-threaded download which on a high-end VM caps at around 80-100 MB/s on average, implying 45 min - 60 min download for a 280 GB checkpoint (8x22b)

rdyro avatar Oct 08 '24 00:10 rdyro

This test was also sometimes (?) running out of RAM without weight_dtype=bfloat16, I'm adding that change to this PR

rdyro avatar Oct 08 '24 16:10 rdyro