img2dataset icon indicating copy to clipboard operation
img2dataset copied to clipboard

How can I download the original of the photo in different formats jpg, png, webp, gif.

Open anhnch30820 opened this issue 10 months ago • 7 comments

With encode=jpg img2dataset download only in jpg format, I want download the original of the photo in different formats jpg, png, webp, gif. How can I config?

anhnch30820 avatar Apr 21 '25 09:04 anhnch30820

disable_all_reencoding = True

rom1504 avatar Apr 21 '25 11:04 rom1504

@rom1504 I have configured as follows, but I only get jpg image:

download(
        processes_count=8,
        thread_count=128,
        url_list=f"/content/drive/MyDrive/image2dataset/coyo700m_shard_0000{shard_index}.parquet",
        image_size=256,
        output_folder=output_dir,
        # output_format="files",
        output_format="webdataset",
        input_format="parquet",
        url_col="url",
        caption_col="text",
        enable_wandb=False,
        # enable_wandb=True,
        resize_mode="no",
        save_additional_columns=['id', 'text_length', 'num_faces'],
        number_sample_per_shard=10000,
        distributor="multiprocessing",
        user_agent_token="Mozilla/5.0",
        skip_reencode=True,
        encode_quality=100,
        disable_all_reencoding=True
    )

anhnch30820 avatar Apr 22 '25 02:04 anhnch30820

Extension is kept jpg in all cases but it's actually the original file

On Tue, Apr 22, 2025, 11:11 anhnch30820 @.***> wrote:

@rom1504 https://github.com/rom1504 I have configured as follows, but I only get jpg image:

download( processes_count=8, thread_count=128, url_list=f"/content/drive/MyDrive/image2dataset/coyo700m_shard_0000{shard_index}.parquet", image_size=256, output_folder=output_dir, # output_format="files", output_format="webdataset", input_format="parquet", url_col="url", caption_col="text", enable_wandb=False, # enable_wandb=True, resize_mode="no", save_additional_columns=['id', 'text_length', 'num_faces'], number_sample_per_shard=10000, distributor="multiprocessing", user_agent_token="Mozilla/5.0", skip_reencode=True, encode_quality=100, disable_all_reencoding=True )

— Reply to this email directly, view it on GitHub https://github.com/rom1504/img2dataset/issues/457#issuecomment-2819878969, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437S3BE26N25QBJTPQNL22WQMXAVCNFSM6AAAAAB3Q65RPWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMJZHA3TQOJWHE . You are receiving this because you were mentioned.Message ID: @.***> anhnch30820 left a comment (rom1504/img2dataset#457) https://github.com/rom1504/img2dataset/issues/457#issuecomment-2819878969

@rom1504 https://github.com/rom1504 I have configured as follows, but I only get jpg image:

download( processes_count=8, thread_count=128, url_list=f"/content/drive/MyDrive/image2dataset/coyo700m_shard_0000{shard_index}.parquet", image_size=256, output_folder=output_dir, # output_format="files", output_format="webdataset", input_format="parquet", url_col="url", caption_col="text", enable_wandb=False, # enable_wandb=True, resize_mode="no", save_additional_columns=['id', 'text_length', 'num_faces'], number_sample_per_shard=10000, distributor="multiprocessing", user_agent_token="Mozilla/5.0", skip_reencode=True, encode_quality=100, disable_all_reencoding=True )

— Reply to this email directly, view it on GitHub https://github.com/rom1504/img2dataset/issues/457#issuecomment-2819878969, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437S3BE26N25QBJTPQNL22WQMXAVCNFSM6AAAAAB3Q65RPWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMJZHA3TQOJWHE . You are receiving this because you were mentioned.Message ID: @.***>

rom1504 avatar Apr 22 '25 02:04 rom1504

thanks

anhnch30820 avatar Apr 22 '25 02:04 anhnch30820

@rom1504 I want to ask more about the skip_reencode parameter, when I set skip_reencode=True the image size is 20.5KB, skip_reencode=False and encode_quality=95 the image size is 38.6KB. Why is that?

anhnch30820 avatar Apr 22 '25 07:04 anhnch30820

https://g.co/gemini/share/8251171709b2 here's why

rom1504 avatar Apr 22 '25 07:04 rom1504

Hi @rom1504, I met same issue https://github.com/rom1504/img2dataset/issues/437 when I downloaded CoYo data. How can I fix?

anhnch30820 avatar Apr 25 '25 02:04 anhnch30820