How can I download the original of the photo in different formats jpg, png, webp, gif.
With encode=jpg img2dataset download only in jpg format, I want download the original of the photo in different formats jpg, png, webp, gif. How can I config?
disable_all_reencoding = True
@rom1504 I have configured as follows, but I only get jpg image:
download(
processes_count=8,
thread_count=128,
url_list=f"/content/drive/MyDrive/image2dataset/coyo700m_shard_0000{shard_index}.parquet",
image_size=256,
output_folder=output_dir,
# output_format="files",
output_format="webdataset",
input_format="parquet",
url_col="url",
caption_col="text",
enable_wandb=False,
# enable_wandb=True,
resize_mode="no",
save_additional_columns=['id', 'text_length', 'num_faces'],
number_sample_per_shard=10000,
distributor="multiprocessing",
user_agent_token="Mozilla/5.0",
skip_reencode=True,
encode_quality=100,
disable_all_reencoding=True
)
Extension is kept jpg in all cases but it's actually the original file
On Tue, Apr 22, 2025, 11:11 anhnch30820 @.***> wrote:
@rom1504 https://github.com/rom1504 I have configured as follows, but I only get jpg image:
download( processes_count=8, thread_count=128, url_list=f"/content/drive/MyDrive/image2dataset/coyo700m_shard_0000{shard_index}.parquet", image_size=256, output_folder=output_dir, # output_format="files", output_format="webdataset", input_format="parquet", url_col="url", caption_col="text", enable_wandb=False, # enable_wandb=True, resize_mode="no", save_additional_columns=['id', 'text_length', 'num_faces'], number_sample_per_shard=10000, distributor="multiprocessing", user_agent_token="Mozilla/5.0", skip_reencode=True, encode_quality=100, disable_all_reencoding=True )
— Reply to this email directly, view it on GitHub https://github.com/rom1504/img2dataset/issues/457#issuecomment-2819878969, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437S3BE26N25QBJTPQNL22WQMXAVCNFSM6AAAAAB3Q65RPWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMJZHA3TQOJWHE . You are receiving this because you were mentioned.Message ID: @.***> anhnch30820 left a comment (rom1504/img2dataset#457) https://github.com/rom1504/img2dataset/issues/457#issuecomment-2819878969
@rom1504 https://github.com/rom1504 I have configured as follows, but I only get jpg image:
download( processes_count=8, thread_count=128, url_list=f"/content/drive/MyDrive/image2dataset/coyo700m_shard_0000{shard_index}.parquet", image_size=256, output_folder=output_dir, # output_format="files", output_format="webdataset", input_format="parquet", url_col="url", caption_col="text", enable_wandb=False, # enable_wandb=True, resize_mode="no", save_additional_columns=['id', 'text_length', 'num_faces'], number_sample_per_shard=10000, distributor="multiprocessing", user_agent_token="Mozilla/5.0", skip_reencode=True, encode_quality=100, disable_all_reencoding=True )
— Reply to this email directly, view it on GitHub https://github.com/rom1504/img2dataset/issues/457#issuecomment-2819878969, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437S3BE26N25QBJTPQNL22WQMXAVCNFSM6AAAAAB3Q65RPWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMJZHA3TQOJWHE . You are receiving this because you were mentioned.Message ID: @.***>
thanks
@rom1504 I want to ask more about the skip_reencode parameter, when I set skip_reencode=True the image size is 20.5KB, skip_reencode=False and encode_quality=95 the image size is 38.6KB. Why is that?
https://g.co/gemini/share/8251171709b2 here's why
Hi @rom1504, I met same issue https://github.com/rom1504/img2dataset/issues/437 when I downloaded CoYo data. How can I fix?