datachain
datachain copied to clipboard
Try dataset pull on large dataset and measure performance
We need to make sure datachain pull works on real user scenarios.
We also need to measure performance, i.e how much overhead we are bringing if user would do just plain export - import by himself.
Datasets of 10M and larger should be tested
Some times for pulling without instantiating from Studio production (team name: demo-1):
ds://laion_wds_1m(1M objects, 14 custom signals) : ~6k rows/secds://laion_wds(11.5M objects, 14 custom signals) : ~6.7k rows/secds://laion(11.5M objects, 48 custom signals) : ? rows/sec
--- IN PROGRESS ---