TripoSR icon indicating copy to clipboard operation
TripoSR copied to clipboard

Evaluate dataset licensing

Open fire opened this issue 1 year ago • 3 comments

Can you release the curated cc-by dataset?

fire avatar Mar 08 '24 18:03 fire

put more effort will you . it's clearly objaverse . just look at "Dataset used to train" on huggingface .

mr-lab avatar Mar 09 '24 19:03 mr-lab

Some of the cc-by licensed artwork in objaverse are incorrectly licensed so I wanted to check.

fire avatar Mar 09 '24 19:03 fire

I have to go for now but I'll be working on a script to get a CC-BY csv with chatgpt.

# Work in progress
# Import necessary libraries
import pandas as pd
from objaverse.xl import objaverse_xl as oxl

def save_cc_by_licenses_as_csv(download_dir="~/.objaverse", output_file="cc_by_licenses.csv"):
    """
    Download annotations from Objaverse-XL and save entries with CC-BY licenses to a CSV file,
    using fileIdentifier as the unique identifier for each 3D object.
    
    Parameters:
    download_dir (str): Directory to cache the downloaded annotations. Defaults to "~/.objaverse".
    output_file (str): The name of the output CSV file. Defaults to "cc_by_licenses.csv".
    """
    
    # Download annotations
    annotations = oxl.get_annotations(download_dir=download_dir)
    
    # Filter for CC-BY licenses
    cc_by_annotations = annotations[annotations['license'] == 'CC-BY']
    
    # Ensure 'fileIdentifier' is used as a reference for each object
    # You might already have it directly from the annotations, this step is just to clarify its importance
    cc_by_annotations = cc_by_annotations[['fileIdentifier', 'source', 'license', 'fileType', 'sha256', 'metadata']]
    
    # Save to CSV
    cc_by_annotations.to_csv(output_file, index=False)
    print(f"Saved CC-BY licensed objects to {output_file} using fileIdentifier as the unique identifier.")

# Call the function
if __name__ == "__main__":
    save_cc_by_licenses_as_csv()

fire avatar Mar 09 '24 20:03 fire