weaviate-python-client icon indicating copy to clipboard operation
weaviate-python-client copied to clipboard

Initial proof of concept for .info() and .head() for collections - for discussion

Open databyjp opened this issue 1 year ago • 0 comments

Idea/proof of concept for info/head.

Not to be taken as anything resembling a final format, but to discuss what we might want & how to implement it.

Example:

c.head()
c.info()
c.info(include_properties=True)

Sample output:

Displaying the first 5 objects in collection 'SomeCollection'
--------------------------------------------------------------------------------------
rating | title     | description   | synopsis      | gross   | categories           | 
--------------------------------------------------------------------------------------
4.5    | star wars | a space opera | a space opera | 1000000 | ['sci-fi', 'action'] | 
3.5    | cats      | a musical     | a musical     | 500000  | ['musical', 'come... | 


========================================
Collection summary: 'SomeCollection'
========================================
Length: 2
Multi-tenancy: False
Index type: hnsw
Properties: 6

========================================
Collection summary: 'SomeCollection'
========================================
Length: 2
Multi-tenancy: False
Index type: hnsw
Properties: 6
--------------------------------------------------------------------------------------------------------
Property Name | DataType | Tokenization | Skip vectorization | Searchable | Filterable | Range filter | 
--------------------------------------------------------------------------------------------------------
title         | text     | word         | False              | True       | True       | False        | 
description   | text     | word         | False              | True       | True       | False        | 
synopsis      | text     | word         | False              | True       | True       | False        | 
categories    | text[]   | word         | False              | True       | True       | False        | 
rating        | number   | None         | False              | False      | True       | False        | 
gross         | int      | None         | False              | False      | True       | False        | 

e.g.:

import weaviate
from weaviate.classes.config import Configure, Property, DataType

client = weaviate.connect_to_local()

cname = "SomeCollection"

client.collections.delete(cname)

client.collections.create(
    name=cname,
    vectorizer_config=Configure.Vectorizer.text2vec_ollama(
        api_endpoint="http://host.docker.internal:11434",  # If using Docker, use this to contact your local Ollama instance
        model="snowflake-arctic-embed:22m",  # The model to use, e.g. "nomic-embed-text"
    ),
    properties=[
        Property(name="title", data_type=DataType.TEXT),
        Property(name="description", data_type=DataType.TEXT),
        Property(name="synopsis", data_type=DataType.TEXT),
        Property(name="categories", data_type=DataType.TEXT_ARRAY),
        Property(name="rating", data_type=DataType.NUMBER),
        Property(name="gross", data_type=DataType.INT),
    ],
    vector_index_config=Configure.VectorIndex.hnsw(
        quantizer=Configure.VectorIndex.Quantizer.bq()
    )
)

c = client.collections.get(cname)

c.data.insert({
    "title": "star wars",
    "description": "a space opera",
    "synopsis": "a space opera",
    "categories": ["sci-fi", "action"],
    "rating": 4.5,
    "gross": 1000000
})
c.data.insert({
    "title": "cats",
    "description": "a musical",
    "synopsis": "a musical",
    "categories": ["musical", "comedy"],
    "rating": 3.5,
    "gross": 500000
})

c.head()
c.info()
c.info(include_properties=True)

client.close()

databyjp avatar Jul 17 '24 04:07 databyjp