ART issues

try mcp-14b-alpha-001, it block by below error, how can i know where the trigger it?

1

Traceback (most recent call last): File "/usr/lib/python3.10/multiprocessing/queues.py", line 244, in _feed obj = _ForkingPickler.dumps(obj) File "/usr/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) TypeError: cannot pickle 'SSLContext' object

johnson7788

Support `exclude` parameter for SkyPilotBackend `_experimental_pull_from_s3`

It is currently possible to exclude large objects like trajectories when pulling models from s3 through the `LocalBackend`. We should do the same for the `SkyPilotBackend`.

arcticfly

RuntimeError: CUDA error: device-side assert triggered when running 2048.ipynb

1

RuntimeError occurs when running 2048.ipynb at this link link: https://colab.research.google.com/github/openpipe/art/blob/main/examples/2048/2048.ipynb ``` loading model from .art/2048-multi-turn/models/agent-002/0010 ==((====))== Unsloth 2025.5.1: Fast Qwen2 patching. Transformers: 4.51.3. vLLM: 0.8.5.post1. \\ /| Tesla T4. Num...

watemailunpi

About asynchronous generation and training like AReal

2

One question about ART framework, will we plan to support asynchronous generation/rollout and training, like https://github.com/inclusionAI/AReaL?tab=readme-ov-file (paper: https://arxiv.org/pdf/2505.24298)? Essentially, it is a non-blocking rollout mechanism so that the ready-to-use rollout...

llv22

RuntimeError: Engine core initialization failed. Failed core proc(s): {}

3

I'm currently working with the ART library on Kaggle and trying to utilize both of the available T4 GPUs. Specifically, I’m experimenting with the Tic-Tac-Toe example and have attempted to...

Tarunrao0

Gradient reliability with sample-by-sample vs batch processing

2

I'm examining the training implementation in src/art/unsloth/service.py and have a question about the gradient computation approach. Currently, the code processes samples individually: for offset in range(0, packed_tensors["tokens"].shape[0]): # Process single...

zfflxx

Feature request - load best checkpoint

2

Currently there is a function `TrainableModel.delete_checkpoints(best_checkpoint_metric)` which removes all checkpoints except the best and the latest. Unfortunately, there is no straight-forward way to load the weights according to the best...

giladfrid009

closing Local backend - bug

2

version: openpipe-art 0.4.4 Notice that in the base class `art.backend.Backend` the functinon `close()` is async. On the other hand, for the `art.local.backend.LocalBackend` backend, the function `close()` is **not async**. As...

giladfrid009

langgraph create_react_agent

4

Hi, do you happen to have any examples/notebooks of using ART to train a qwen model that can be dropped into langgraph's create_react_agent?

austinmw

Docs on decision-making process for choosing batch size, learning rate, etc?

1

At a first approximation, it's not obvious how to think about choosing a batch size and learning rate. Small batches reduce inference overhead on the GPUs and generally reduce iteration...

arcticfly

documentation

ART
ART copied to clipboard

Metadata

try mcp-14b-alpha-001, it block by below error, how can i know where the trigger it?

Support `exclude` parameter for SkyPilotBackend `_experimental_pull_from_s3`

RuntimeError: CUDA error: device-side assert triggered when running 2048.ipynb

About asynchronous generation and training like AReal

RuntimeError: Engine core initialization failed. Failed core proc(s): {}

Gradient reliability with sample-by-sample vs batch processing

Feature request - load best checkpoint

closing Local backend - bug

langgraph create_react_agent

Docs on decision-making process for choosing batch size, learning rate, etc?

← Metadata

Owner

Metadata

ART ART copied to clipboard

Metadata

← Metadata

Owner

Metadata

ART
ART copied to clipboard