jglaser

Results 24 issues of jglaser

The purpose of this PR is to enable different attention masks per mini-batch in the sparse attention module. Generally, sentences are of different length and so it doesn't really make...

To enable compilation of code which includes the Random123 RNG library, we need the `std::make_signed` templates, which I am adding in this PR.

When using `WarpScanShfl` from `warp_scan_shfl.cuh` inside a `while()` loop and in conjunction with a sub-warp `LOGICAL_WARP_THREADS` argument, i.e. `LOGICAL_WARP_THREADS=2^n` with `n

type: bug: functional
info needed
P3: backlog
repro: missing

Fixes issue NVIDIA/cccl#854

type: bug: functional
info needed
P1: should have
repro: missing

This PR implements the normalization of gradients (by the norm of all gradients in the model) as discussed in https://developer.nvidia.com/blog/pretraining-bert-with-layer-wise-adaptive-learning-rates/ Adding the `prenorm` Boolean option to `torch_optimizer.lamb`

**Describe the bug** With 90x16GB workers, query 2 of NVIDIA GPU leads to this log entry and subsequent crash ``` 2021-06-06 19:47:27.279|13|info|498234689|||MemoryMonitor about to free memory from tasks||||| 2021-06-06 19:47:27.279|13|info|498234689|||MemoryMonitor...

bug

**Describe the bug** Running on 90 workers, I get the following error ``` Could not create directory: /gpfs/alpine/proj-shared/gen119/bsql_shared/logs_ucx_1060213[Errno 17] File exists: '/gpfs/alpine/proj-shared/gen119/bsql_shared/logs_ucx_1060213' distributed.worker - WARNING - Compute Failed Function: initialize_server_directory...

bug

**What happened**: Currently , the jobqueue component of `dask-gateway` relies on sudo to do user authentication on its own, rather than integrating with authentication mechanisms provided by the respective systems...

I am fixing a few apparent bugs in the code. The upshot is that the attention now supports a block size of the (next largest power of two) of the...

I wrote a simple test to check the output of the hierarchical transformer self attention against the BERT self attention from huggingface transformers. ``` import torch import torch.nn as nn...