One comments

Results 109 comments of

One

[Remote-SSH Bug]:

That same issue happened on the latest version, `v0.114.2`, when the host has the home directory mounted as remote filesystem. Also solved through setting `"remote.SSH.useExecServer": false`

Installation is a mess, instructions are a mess

We've tested the following installation instructions and pip package. What error message did you encounter, can you post it here? ``` conda create -y --name openchat conda activate openchat conda...

Avoid padding computation with `cu_seqlens`

Thanks! I have tested the kernel and it does work. However, the padding elements may be uninitialized, resulting in NaN/inf in the forward and backward passes. Can we include a...

Avoid padding computation with `cu_seqlens`

BTW, here is the code used for testing: ```python from typing import Any import torch from tqdm import tqdm from flash_attn import flash_attn_varlen_func def test_flash_attn_padding( seed: int = 0, test_rounds:...

Stochastic rounding in bfloat16

@Nerogar Thanks for your experiments! I'll try your implementation

Stochastic rounding in bfloat16

BTW, one possible alternative: 16+8 optimizer https://arxiv.org/pdf/2309.12381.pdf It stores 8 extra bits of mantissa, achieving the same model accuracy as FP32 optimizer, at the low cost of 16% more VRAM

Stochastic rounding in bfloat16

@AmericanPresidentJimmyCarter Thanks for your implementation! I've seen the comment on unstable weight decay. Could you please try adding the weight decay to the update and then stochastically add the update...

Stochastic rounding in bfloat16

Update: I've written a fused CUDA version of 16+16 AdamW optimizer: https://github.com/imoneoi/bf16_fused_adam. With an extra 16-bit mantissa term, it is equivalent to fp32 master weights.

[Bug]: `wandb.log` blocks training loop periodically

Hi @timoffex Here is the info of my wandb training code 1. wandb version 0.18.1 2. 5-6 scalars per step every training iteration, same on each iteration, and 1-2 scalars...

[Bug]: `wandb.log` blocks training loop periodically

BTW What does `wandb.log` do? Is there any blocking operation, is it related to disk read/write latency? I only observed periodic blocking in one environment, but it was fine in...