minerl Data Iteration Issues

When iterating through Navigate, NavigateExtreme, and NavigateDense data using data.batch_iter(), I get one of the following errors. These do not occur at the beginning of my batch iteration loop, but instead after more than 100 iterations into training:

Traceback (most recent call last):
  File "train-extreme-loaded.py", line 35, in <module>
    for s, a, r, sp, d in data.batch_iter(
  File "/home/jack/.local/lib/python3.8/site-packages/minerl/data/data_pipeline.py", line 405, in batch_iter
    for seg_batch in minibatch_gen(traj_iter(), batch_size=batch_size, nsteps=seq_len):
  File "/home/jack/.local/lib/python3.8/site-packages/minerl/data/util/__init__.py", line 269, in minibatch_gen
    trajs[i] = t = multimap(cat, *[t, next(traj_iter)])
  File "/home/jack/.local/lib/python3.8/site-packages/minerl/data/data_pipeline.py", line 385, in traj_iter
    s, a, r, sp1, d = trajectory_queue.get()
TypeError: cannot unpack non-iterable NoneType object

it=171 Loss: 10.137125015258789
it=172 Loss: 8.685365676879883
Exception in thread QueueManagerThread:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/concurrent/futures/process.py", line 441, in _queue_management_worker
    shutdown_worker()
  File "/usr/lib/python3.8/concurrent/futures/process.py", line 334, in shutdown_worker
    call_queue.put_nowait(None)
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 132, in put_nowait
    return self.put(obj, False)
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 82, in put
    raise ValueError(f"Queue {self!r} is closed")
ValueError: Queue <concurrent.futures.process._SafeQueue object at 0x7f2d5967ce80> is closed

it=159 Loss: 8.279386520385742
it=160 Loss: 11.42676830291748
Exception in thread QueueManagerThread:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/concurrent/futures/process.py", line 376, in _queue_management_worker
    thread_wakeup.clear()
  File "/usr/lib/python3.8/concurrent/futures/process.py", line 94, in clear
    self._reader.recv_bytes()
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
OSError: [Errno 9] Bad file descriptor

I'm running Arch Linux using PyTorch on an Nvidia GTX 1050M. I'm currently using Python 3.8.6. My main training script is below (with utilities and model implementation ommitted)

import sys
import gym
import minerl
import torch
from torch import nn
from model import Model
from utils import *
torch.cuda.set_device(0)
if torch.cuda.is_available():
  dev = "cuda:0"
else:
  dev = "cpu"

LR = 0.0001
SEQ_LEN = 16
BATCH_SIZE = 64
# Sample some data from the dataset!
PATH="model_state_dict"
data = minerl.data.make("MineRLNavigateDense-v0")

model = Model(2, 200).cuda()
model.load_state_dict(torch.load(PATH))


cross_ent = nn.CrossEntropyLoss().cuda()
mse = nn.MSELoss().cuda()
# optimizer = torch.optim.SGD(model.parameters(), lr=LR)
optimizer = torch.optim.Adam(model.parameters(), lr=LR)

old_loss = float("Inf")
# Iterate through a single epoch using sequences of at most 32 steps
it = 0

for s, a, r, sp, d in data.batch_iter(num_epochs=1, seq_len=SEQ_LEN, batch_size=BATCH_SIZE):

    # optimizer changes params, loss computes the gradients
    # dicts are arrays of samples
    pov_tensor, feat_tensor = Navigatev0_obs_to_tensor(s)
    pov_tensor = pov_tensor.cuda()
    feat_tensor = feat_tensor.cuda()
    pov_tensor, feat_tensor = (
        torch.transpose(expand(pov_tensor), 1, 3),
        expand(feat_tensor),
    )
    action_tensor = Navigatev0_action_to_tensor(a)

    action_tensor = {a: expand(t).cuda() for a, t in action_tensor.items()}
    #   4. Write a training loop
    optimizer.zero_grad()
    outputs = model(pov_tensor, feat_tensor).cuda()
    loss = cross_ent(outputs[:, 0:2], action_tensor["attack"])
    loss += cross_ent(outputs[:, 2:4], action_tensor["back"])
    loss = mse(outputs[:, 4:6], action_tensor["camera"])
    loss += cross_ent(outputs[:, 6:8], action_tensor["forward"])
    loss += cross_ent(outputs[:, 8:10], action_tensor["jump"])
    loss += cross_ent(outputs[:, 10:12], action_tensor["left"])
    loss += cross_ent(outputs[:, 12:14], action_tensor["right"])
    loss += cross_ent(outputs[:, 14:16], action_tensor["place"])
    loss += cross_ent(outputs[:, 16:18], action_tensor["sneak"])
    loss += cross_ent(outputs[:, 18:20], action_tensor["sprint"])
    loss.backward()
    optimizer.step()
    it += 1
    if it % 1 == 0:
        print(f"{it=} Loss: {loss.item()}")
    if it >= 1000:
        # if loss.item() > old_loss + 0.5 or it >= 30:
        print(f"Converged at iter {it} with loss {loss.item()}")
        break
    old_loss = loss.item()

I have tried re-downloading parts of the dataset in case it was corrupted in someway, but no dice. Any help is greatly appreciated.

Oct 26 '20 15:10 jarbus

Update: After talking with Miffyli on the #support channel, two of the errors are no longer with MineRL: The OSError: [Errno 9] Bad file descriptor is apparently intended behavior, and ValueError: Queue <concurrent.futures.process._SafeQueue object at 0x7f2d5967ce80> is closed is apparently a multi-processing issue with PyTorch. However, I still don't understand the NoneType error:

 File "/home/jack/.local/lib/python3.8/site-packages/minerl/data/data_pipeline.py", line 385, in traj_iter
    s, a, r, sp1, d = trajectory_queue.get()
TypeError: cannot unpack non-iterable NoneType object

Oct 28 '20 20:10 jarbus

Code to produce the aforementioned error

import gym
import minerl
it = 0

data = minerl.data.make("MineRLNavigateExtreme-v0")
for s, a, r, sp, d in data.batch_iter(num_epochs=10, seq_len=16, batch_size=64):
    print(it)
    it+=1

Oct 28 '20 20:10 jarbus

I can confirm above error happens with code above and with "MineRLNavigateExtremeDense-v0" data (I do not have MineRLNavigateExtreme-v0, despite we have same VERSION=3). It seems to happen at some specific data file after few iterations over batch_iter.

Oct 28 '20 22:10 Miffyli

Try data.load_data instead of data.batch_iter

Nov 11 '20 07:11 stefanjuang

minerl minerl copied to clipboard

Data Iteration Issues

minerl
minerl copied to clipboard