Curator icon indicating copy to clipboard operation
Curator copied to clipboard

Zyda2 tutorial - key error when running compute_counts script

Open ronjer30 opened this issue 1 year ago • 1 comments

Describe the bug When running the 2_compute_counts.py script, it fails with an error Exception: 'KeyError("[\'size\'] not in index")'

Steps/Code to reproduce bug

  1. Follow steps in tutorial
  2. Run python3 2_dupes_removal/2_compute_counts.py
  3. Script fails with following error
NeMo-Curator/tutorials/zyda2-tutorial/2_dupes_removal/2_compute_counts.py", line 55, in group_partition
    return result[
  File "/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py", line 6252, in _raise_if_missing
    raise KeyError(f"{not_found} not in index")
KeyError: "['size'] not in index"

Expected behavior Successful run with size calculated correctly.

Environment overview (please complete the following information)

Environment location: Slurm Method of NeMo-Curator install: docker container, dev image from nvcr.io/nvidia/nemo:dev

Additional context Adding this line sizes = sizes.rename(columns={0: 'size'}) after sizes = partition.groupby("group").size().reset_index() appears to correctly rename the column and fixes the error

ronjer30 avatar Nov 05 '24 21:11 ronjer30

@ronjer30 Please share the latest updates

sithape2025 avatar Jan 22 '25 19:01 sithape2025

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Jul 26 '25 02:07 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

github-actions[bot] avatar Aug 03 '25 02:08 github-actions[bot]