datachain
datachain copied to clipboard
Parallel setting does not work on Windows
from ultralytics import YOLO
from datachain import C, DataChain, File
from datachain.model.ultralytics import YoloBBoxes
def process_bboxes(yolo: YOLO, file: File) -> YoloBBoxes:
results = yolo(file.as_image_file().read(), verbose=False)
return YoloBBoxes.from_results(results)
(
DataChain.from_storage("gs://datachain-demo/openimages-v6-test-jsonpairs/")
.filter(C("file.path").glob("*.jpg"))
.limit(20)
.settings(parallel=4, prefetch=4)
.setup(yolo=lambda: YOLO("yolo11n.pt"))
.map(boxes=process_bboxes)
.show()
)
failed to run on Windows because of PytorchStreamReader failed reading file data/407: file read failed error (see this CI run).
Works fine without parallel setting (settings(parallel=4, prefetch=4)). Also works fine on Linux and OS X.
It looks like on Windows when parallel setup running it downloads the same "yolo11n.pt" file several times and in UDF it fails to read this file, since it is corrupted by downloading from another process.