ekuznetsov139

Results 8 issues of ekuznetsov139

I am trying to use colmap to reconstruct a scene from a set of fisheye lens images, 2048x2048 with FOV 180 degrees (camera: RADIAL_FISHEYE 2048 2048 600 1024 1024 0...

layers/common_hparams.py mentions a hyperparameter "pretrained_model_dir": "Directory containing a checkpoint for a pretrained model. This will only be used if a new run is being started. Parameters not found in the...

All pages on https://rocm-documentation.readthedocs.io/en/latest/index.html are limited to approximately 800 pixels in width and there's no apparent way to make the text wider (I can zoom in but that just makes...

I'm trying to reproduce BEV training results, and I have a few questions. 1. The source tree references a "posetrack" dataset. It is not mentioned in the paper, its official...

The following fails to compile on AMD (gfx942; have not tried on NVIDIA, the issue may be present there too): ``` @triton.jit def dot_test(p, out, N:tl.constexpr, K:tl.constexpr, M:tl.constexpr): p_block =...

This splits _fwd_grouped_kernel_stage1 operation in two parts, using a temporary buffer. The rationale is as follows. _fwd_grouped_kernel_stage1 becomes a significant bottleneck in DeepSeek-V3 with large context lengths. It is slow,...

This PR uses Triton to fuse DeepseekScalingRotaryEmbedding operation with is_neox_style=False (observed in DeepSeek-V3). It substantially reduces the number of distinct kernels in DeepSeek-V2/V3 profile (the number of kernel launches per...

This reintroduces the bugfix from #14987 that somehow got lost during refactoring.