HeKa

Results 16 issues of HeKa

I had a bunch of data that I segmented with multiple reactors, and there was no data interaction between the segments. But now the problem is that the data after...

I tried ``` export CUDA_VISIBLE_DEVICES=0 export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps export CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log export CUDA_MPS_ACTIVE_THREAD_PERCENTAGE=50 nvidia-cuda-mps-control -d ``` in host machine and `docker run --rm -it -e NVIDIA_VISIBLE_DEVICES=0 -e CUDA_MPS_ACTIVE_THREAD_PERCENTAGE=50 --gpus=0 --runtime=nvidia --ipc=host 25859ecc2950...

Using S3 file system with C++ API in Tensorflow.

# Description Now CheckpointManager is available in Deepray when training with TFRA Dynamic Embedding. [fix] In deepray/core/base_trainer.py, gpu_affinity didn't take effect when NVML Shared Library Not Found. [fix] In deepray/core/base_trainer.py...

**Please describe the bug** Training with ShardParallel **Please describe the expected behavior** unexpected system error **System information and environment** - OS Platform and Distribution (e.g., Linux Ubuntu 16.04, docker): ubuntu...

Work stealing method in **NonBlockingWorkQueue** is not a good idea to reduce cache missing in LLC automatically. Depth-first execution computation graphs are generally more cache-friendly. If I want to apply...

# Description In my previous tests, there was only one TFRA Embedding object passed into the trackables in the save loading function, even though there were multiple TFRA Embedding objects...

Is this [link](https://github.com/NVIDIA/cudnn-frontend/blob/9f82dda5c029d15a5f371f0fe003dc0c74a0c987/samples/legacy_samples/f16_flash_mha_sample.cpp#L714C14-L714C14) a flash attention?

H100 card have been release for a several mouths, but there is little kernel support for their Float8 Tensor core.

Fail to Jax custom call when training a model with sequence length is larger than 2048. At nowadays, seqlen in most model was bigger than 2048, at least 4096. 2048...

jax