bschifferer

Results 108 comments of bschifferer

I rerun the test with native installed TensorFlow: TF2.6 (pip) - Nothing: 31/32 GB - Set cuda_malloc_async: kernel dies - Set TF_MEMORY_ALLOCATION=0.5: 31/32GB TF2.7 (pip) - Nothing: 31/32 GB -...

> rename to allocate_tensorflow_memory add kw `type=dynamic | fixed | None` if default None it will use best based on tf version if fixed force use of tf_memory_allocation if dynamic...

@jperez999 have you had a change to update `configure_tensorflow` to `allocate_tensorflow_memory` ?

The HugeCTR team proposed that it could be related to having multiple GPUs and not using MirrorStrategies. They shared an example: ``` import os import tensorflow as tf import sparse_operation_kit...

I dont know if this bug is still valid - it is from April 6th. I havent worked on SOK + dataloader since then. But if we want to provide...

I am following up on the discussion. I apologize for missing the discussion @sohn21c - The minimum, we want to present, is the example you linked. `Provide a clear example...

When we move to the new container structure, do we still need a shared docker volume?

I guess this is related tot he old examples. We havent had time working on them, yet

I took a look and Merlin Models have only a few set of metrics. Merlin Systems and Merlin/Merlin are based on Merlin Models examples and uses the same metrics. Can...

After our last CI meeting, we want only track a few metrics for some specific notebooks. @jperez999 is our CI a single or multi-GPU environment? Currently, we use following datasets...