bschifferer
bschifferer
A detailed view is available here: https://docs.google.com/document/d/1g5FUrdhZQzef1OWwiQLfNNGdHr4a71Cr-jqndl-SoQg/edit#
Collecting results in a google spreadsheet (details) + some slides as a summary
Hello @ssubbayya , thanks for reporting the bug. You are correct. I found a workaround that it will train: You need to: - add parameters global_size=1, global_rank=0 when initialising the...
train = Dataset(os.path.join(args.path, "train", "part_" + str(MPI_RANK) + ".parquet")) valid = Dataset(os.path.join(args.path, "valid", "part_" + str(MPI_RANK) + ".parquet")) Can you try to add part_size parameter to the Dataset above? Dataset(os.path.join(args.path,...
@ssubbayya `ValueError: None values not supported.` sounds that the dataset contains NaN values / None values, is that correct? You should be able to test it like this Dataset().to_ddf().isna().sum().compute() Can...
The distributed embedding examples uses a custom train step functions: https://github.com/NVIDIA-Merlin/distributed-embeddings/blob/main/examples/dlrm/main.py#L201-L215 In my understanding, distributed embedding does NOT work with keras model.fit function: https://github.com/NVIDIA-Merlin/models/pull/974/files#diff-1e42e5c4771f01c26b3c78c545eb341590a4406b2c5af8da0491ab4b7ea51464R80 I think we need the distributed...
@rnyak do we have unittests for all notebooks in Transformers4Rec?
Radek provided an example for the new transformer architecture
@viswa-nvidia I think I can close the ticket. There was no progress for >1 year and only one subtask was left.
@viswa-nvidia I closed the ticket, as there is no progress. Please reopen, if we require it.