thomas chaton comments

Results 218 comments of


                                            thomas chaton

Low success rate on donwloading laion400m

Thanks @rom1504 The machine has 32 CPUs, so I thought it should be fine. I am running inside a docker container, so having some issues to install knot resolver. I...

Low success rate on donwloading laion400m

Hey @rom1504 Any idea what I should be looking for on the docker or cloud provider side as possible source of issues? Also, should I use knot or bind9?

Low success rate on donwloading laion400m

Thanks, @rom1504 I will check this out. I managed to install knot on the host but it isn't visible inside the container and networking seems broken. Have you ever tried?

Low success rate on donwloading laion400m

I am also curious what kind of numbers do you get without using knot resolver ?

Low success rate on donwloading laion400m

Hey @rom1504 I am trying to get it working on https://lightning.ai/, so it runs in docker. Yes, my success rate is far from this. So something is wrong.

Low success rate on donwloading laion400m

@rom1504 Here is the PR I am working on: https://github.com/Lightning-AI/pytorch-lightning/pull/19400 and the API: I am trying to make data processing efficient while easy to hack around. Here is the example...

Low success rate on donwloading laion400m

It seemed Image downloading speeds were quite similar between optimize and img2dataset. But I need to be more principled and collect the same metrics to build a more educated comparison....

Low success rate on donwloading laion400m

The distribution is already fully handled by the `optimize` and `map` operators. Check this example: https://lightning.ai/lightning-ai/studios/prepare-the-tinyllama-1t-token-dataset?view=public&section=data+processing Example to tokenize SlimPajama. ```python import json from pathlib import Path import zstandard as...

Low success rate on donwloading laion400m

Hey @rom1504 I am able to get 1.1k images/sec. I think I have a version of knot resolver that works. I am also using http2 from httpx client and I...

Low success rate on donwloading laion400m

Hey @rom1504, > Be careful with sorting the urls as you risk to dos the hosts. I had randomly shuffled them in laion datasets to mitigate this. Interesting. Yes, I...