Wes Vaske

Results 9 comments of Wes Vaske

In general, I'd like to agree but I'm looking at DLRM and it's sort of a unique case. One popular data format for DLRM data is parquet. Since it's columnar...

@zhenghh04 , is the version of DLIO on pypi up-to-date with your changes? If not, can you rev the version to 2.1 or 2.0.1 and push a new version?

> The attributes "per_host_mem_kB" and "total_mem_kB" are in kilo-bytes but the CLI args are in GB and the raw memory capacity pulled from the nodes info is in B. Would...

> As this PR appears to be updating the rules for v2.0, there was a recent discussion in the checkpointing subgroup about model sizes. The table below shows the memory...

Checkpointing works. Check out the history function `"mlpstorage history show"` Added a report generator. `"mlpstorage reports reportgen"` Please test and provide feedback. The report generated has a lot of extra...

In the working group, we decided to do the pagecache flush in the benchmark script. The execution would look like: call dlio to write 2 checkpoints run callback to clear...

I hit one issue while testing this. If the checkpoint files did not exist, I would see writes after doing the checkpoint and a comm.barrier(). If the checkpoint files DID...

> > I hit one issue while testing this. If the checkpoint files did not exist, I would see writes after doing the checkpoint and a comm.barrier(). If the checkpoint...