Praateek Mahajan

Results 9 issues of Praateek Mahajan

``` RuntimeError: Traceback (most recent call last): File "/home/praateek/.local/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 40, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "", line 32, in __getitem__ vid = skvideo.io.vread(self.__xs[index]) File...

## Describe the proposal MLFlow currently seems to deploy each model to a new SageMaker instance. Since Novermber 2019, SageMaker has come up with something called multi-model endpoint, which allows...

enhancement
Acknowledged
integrations/sagemaker

## Description Reading 6000 files of ~25mb each, i.e ~145gb over 8GPUs | add_filename | partition_size | input_meta | Using `dask.read_json` #285 | Providing meta in `dask.from_map` #291 | |--------|--------|--------|--------|---------|...

## Description ## Usage ```python # Add snippet demonstrating usage ``` ## Checklist - [ ] I am familiar with the [Contributing Guide](https://github.com/NVIDIA/NeMo-Curator/blob/main/CONTRIBUTING.md). - [ ] New or Existing tests...

**Is your feature request related to a problem? Please describe.** Currently [numpy is restricted to < 2](https://github.com/NVIDIA/NeMo-Curator/blob/fa4befcad0a804d9b8ad4a9870b2fd87196d2d26/requirements/requirements.txt#L17). But in cudf 24.10 release [numpy allows 2.0 release](https://github.com/rapidsai/cudf/blob/branch-24.10/python/cudf/pyproject.toml#L28). However we tried just...

enhancement

**Describe the bug** Semantic Dedup often gets stuck at the state when we call `semantic_cluster_dedup.extract_dedup_data`. **Steps/Code to reproduce bug** Run semantic dedup when the `client = get_client(device_type='gpu', protocol='ucx')` **Environment overview**...

bug

This pull request is a Proof of Concept for `GPT2Tokenizer` in the file `python/cudf/cudf/core/gpt2_tokenizer.py`. The `GPT2Tokenizer` class is designed to tokenize a cuDF strings column using CUDA GPT2 subword tokenizer...

libcudf
Python
CMake
Java
ci
conda

Creates a provider in `exchange`. In goose, adds a config for the NVIDIA provider (using `llama-3.1-405b`) to `default_model_configuration`

enhancement
help wanted
work-in-progress

Makes development easier in vscode world. See https://code.visualstudio.com/docs/devcontainers/containers Allows folks to contribute more easily as long as they have the `Dev Containers` extension, they should be able to open the...