ReFinED icon indicating copy to clipboard operation
ReFinED copied to clipboard

Inefficient Process for Adding New Entities in ReFinED

Open Shoumik-Gandre opened this issue 9 months ago • 0 comments

When trying to add a dozen more entities by running preprocess_all.py, the process requires downloading over 100GB of data, which is highly inefficient for such a small addition.

This model cannot be considered to have zero-shot capabilities until there is a streamlined, bloat-free script for adding new entities into the system.

Steps to Reproduce:

  1. Clone the repository and set up the environment as per the documentation.
  2. Attempt to add a dozen new entities by running preprocess_all.py.
  3. Observe the data download requirements and inefficiency.

Expected Behavior:

There should be a lightweight and efficient process for adding new entities without requiring extensive data downloads.

Actual Behavior:

Adding new entities requires downloading over 100GB of data, making the process highly inefficient and cumbersome.

Environment:

Google Colab Operating System: Linux Python Version: 3.10

Severity:

High - This issue severely impacts the usability and efficiency of adding new entities to the system and needs immediate attention.

Shoumik-Gandre avatar May 18 '24 03:05 Shoumik-Gandre