Tony Hutter

Results 241 comments of Tony Hutter

Update: I've now added an `-r` option to `scr_flush_file` to AXL_Resume() a checkpoint transfer from for a particular dataset. I'm getting closer to actually writing the `scr_poststage` script that calls...

Side note - since users may want to call `scr_poststage` as their stage-in script (to wait for the old checkpoint transfer to complete), we may want to consider giving it...

> Based on my understanding of the IBM BB software, I don't think that second job allocation can either cancel or wait on a transfer that was started in an...

@adammoody thanks I see what you're saying now. Yea, having `scr_poststage` mark `VALID=YES` only if the dataset is complete would be the best way to do it. I'd be nice...

More updates/braindump: 1. When our scr_poststage script runs, it will only have the information in the state file to finish the transfer, like: ```bash $ ~/kvtree_print_file ./.scr/scr.dataset.1/rank_11.state_file FILE /tmp/bblv_hutter2_132955/tmp/hutter2/scr.defjobid/scr.dataset.1/rank_11.ckpt STATUS...

minor typos ```diff - but is wasteful particularly for a low-memory system. Instead, + but it's wasteful particularly for a low-memory system. Instead, - which can be used override the...

Couple of thoughts before I take a look at the code: We've debated in the past whether or not to remove a vdev from a pool on a udev remove...

@amotin > From that perspective I think it would be reasonable for ZED to just not make too quick extra movements like kicking in spares for few minutes after disk...

I tested this a little using a draid1:4d:9c:1s pool of 9 NVMe drives. I tested powering off/on a NVMe drive using `/sys/bus/pci/slots//power` and all seemed to work as expected. I...

I just did another test today using multipathed spinning disks in a SAS enclosure. I first created a three disk mirror pool, then I removed a disk while the pool...