vector-io icon indicating copy to clipboard operation
vector-io copied to clipboard

Create LanceDB index after table is created in import

Open dhruv-anand-aintech opened this issue 1 year ago • 1 comments

Checklist
  • [X] Modify src/vdf_io/import_vdf/lancedb_import.py ✓ https://github.com/AI-Northstar-Tech/vector-io/commit/f168003cd3994a1082afd1126b665682b0d852f8 Edit
  • [X] Modify src/vdf_io/import_vdf/lancedb_import.py ✓ https://github.com/AI-Northstar-Tech/vector-io/commit/f168003cd3994a1082afd1126b665682b0d852f8 Edit

dhruv-anand-aintech avatar Apr 26 '24 10:04 dhruv-anand-aintech

🚀 Here's the PR! #87

See Sweep's progress at the progress dashboard!
💎 Sweep Pro: I'm using GPT-4. You have unlimited GPT-4 tickets. (tracking ID: a4abad1443)

[!TIP] I can email you next time I complete a pull request if you set up your email here!


Actions (click)

  • [ ] ↻ Restart Sweep

Step 1: 🔎 Searching

I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.

Some code snippets I think are relevant in decreasing order of relevance (click to expand). If some file is missing from here, you can mention the path in the ticket description.

https://github.com/AI-Northstar-Tech/vector-io/blob/9cec7fece241357cabdb153511b13c9c9236fb0a/src/vdf_io/import_vdf/lancedb_import.py#L1-L163

https://github.com/AI-Northstar-Tech/vector-io/blob/9cec7fece241357cabdb153511b13c9c9236fb0a/src/vdf_io/util.py#L1-L503


Step 2: ⌨️ Coding

  • [X] Modify src/vdf_io/import_vdf/lancedb_import.py ✓ https://github.com/AI-Northstar-Tech/vector-io/commit/f168003cd3994a1082afd1126b665682b0d852f8 Edit
Modify src/vdf_io/import_vdf/lancedb_import.py with contents: Import the LanceDB `create_index` method at the top of the file:
from lancedb import create_index
  • [X] Modify src/vdf_io/import_vdf/lancedb_import.py ✓ https://github.com/AI-Northstar-Tech/vector-io/commit/f168003cd3994a1082afd1126b665682b0d852f8 Edit
Modify src/vdf_io/import_vdf/lancedb_import.py with contents: In the `upsert_data` method of the `ImportLanceDB` class, after the code block that creates a new table or opens an existing one, add the following to create an index on the table:
# Get the ID column from the parquet file schema
parquet_schema = pq.read_schema(parquet_files[0])
id_column = "id" # Default 
for field in parquet_schema:
    if field.name == ID_COLUMN:
        id_column = field.name
        break

# Create index on the table  
create_index(table, id_column)
tqdm.write(f"Created index on {id_column} for table {new_index_name}")

This code reads the schema of the first parquet file to determine the name of the ID column (defaulting to "id" if not found). It then calls create_index passing the table object and ID column name to create an index on that column.


Step 3: 🔁 Code Review

I have finished reviewing the code for completeness. I did not find errors for sweep/create_lancedb_index_after_table_is_crea.


🎉 Latest improvements to Sweep:
  • New dashboard launched for real-time tracking of Sweep issues, covering all stages from search to coding.
  • Integration of OpenAI's latest Assistant API for more efficient and reliable code planning and editing, improving speed by 3x.
  • Use the GitHub issues extension for creating Sweep issues directly from your editor.

💡 To recreate the pull request edit the issue title or description. Something wrong? Let us know.

This is an automated message generated by Sweep AI.

sweep-ai[bot] avatar Apr 30 '24 09:04 sweep-ai[bot]