dataloader
dataloader copied to clipboard
The merlin dataloader lets you rapidly load tabular data for training deep leaning models with TensorFlow, PyTorch or JAX
This PR fixes #163. This helps solve is isaligned check issue that occurs with ragged columns in tensorflow.
I tried the Pytorch loader but it is giving me the following error: `BufferError: DLPack only supports signed/unsigned integers, float and complex dtypes.` but if I switched Tensorflow it works....
### Bug description I am trying to extract embedding but the following options do not work. Option 1: I tried these scripts but none works: ``` model_transformer.query_embeddings(train, index='session_id') or model_transformer.query_embeddings(train,...
**Describe the issue**: Dataloader accumulates GPU memory across batches if not manually calling `gc.collect()` after each batch or after every e.g every 5th batch. See example below, manually calling garbage...
### Bug description In data parallel training, we start multiple workers with different initialization of the dataloader and train with horovod. After each batch update, the parameters are synced. Merlin...
Fixes #54 Update instructions for conda install to specify correct minimum version of Python (3.8) and separate conda install from conda environement creation
Adds a fixture to cleanup dataloader after each test runs. This ensures that if a test using the Merlin Dataloader only partially consumes a dataloader instance (and isn't using it...
Specify merlin dependencies in setup.py to create release specifier that matches current release for merlin dependencies Development - tag will be something like `23.12.dev0+1.ge73d8ba` - `merlin_dependency("merlin-core")` returns `merlin-core` (unpinned) Release...
As of 12372f4c6562f296c510f6734e748ef54c375c33, device assignment in the PyTorch dataloader does not work correctly with multiple GPUs. ```python import os import pandas as pd from merlin.dataloader.torch import Loader from merlin.io.dataset import...