batch_rl icon indicating copy to clipboard operation
batch_rl copied to clipboard

Reading atari files directly.

Open DuaneNielsen opened this issue 3 years ago • 3 comments

Hi, contributing this example of how to read the atari files directly, in case anyone wants to do that.

Note that the data is stored in the same temporal sequence it was logged, as you can see by watching the replay.

import gzip
import cv2
import numpy as np

STORE_FILENAME_PREFIX = '$store$_'

ELEMS = ['observation', 'action', 'reward', 'terminal']

if __name__ == '__main__':

    data = {}

    data_dir = '/home/duane/data/dqn/Breakout/1/replay_logs/'
    suffix = 0
    for elem in ELEMS:
        filename = f'{data_dir}{STORE_FILENAME_PREFIX}{elem}_ckpt.{suffix}.gz'
        with open(filename, 'rb') as f:
            with gzip.GzipFile(fileobj=f) as infile:
                data[elem] = np.load(infile)

    for obs in data['observation']:
        cv2.imshow('obs', obs)
        cv2.waitKey(20)

DuaneNielsen avatar May 13 '21 18:05 DuaneNielsen

Here is another implementation.

from dopamine.discrete_domains import atari_lib
from dopamine.replay_memory import circular_replay_buffer
class DummyWrappedBuffer(circular_replay_buffer.OutOfGraphReplayBuffer):
    def __init__(self,
               observation_shape,
               stack_size,
               replay_capacity=1000000,
               batch_size=32):
        super(DummyWrappedBuffer, self).__init__(observation_shape, stack_size, replay_capacity, batch_size)

def load_logs(logs_dir, suffix):
    buffer = DummyWrappedBuffer(
    observation_shape=atari_lib.NATURE_DQN_OBSERVATION_SHAPE,
    stack_size=atari_lib.NATURE_DQN_STACK_SIZE)
    buffer.load(checkpoint_dir=logs_dir, suffix=suffix)
    # A dict with keys: observation, action, reward, terminal
    return buffer._store
if __name__ == '__main__':
    log_path = osp.join(log_base, game, log_split, "replay_logs")
    store = load_logs(log_path, "0")

Altriaex avatar May 17 '21 14:05 Altriaex

Thank you for great works. But I wonder whether the dataset cumulates sequentially. That is, along the implementation of Duane, whether data["observation"][1] indicates the state after we take data["action"][0] in data["observation"][0].

Additionally, did you know is there a similar dataset implemented with torch, not a dopamine?

jsw7460 avatar Nov 24 '21 00:11 jsw7460

Yes, the dataset is composed of (s, a, r, s') tuples like the way you indicated (you can visualize the data in a colab/jupyter notebook). For a Pytorch version of the dataset, you can use the code for this NeurIPS'21 paper: https://github.com/mila-iqia/SGI/blob/master/src/offline_dataset.py.

agarwl avatar Nov 24 '21 01:11 agarwl