chroma
chroma copied to clipboard
JS/TS + Misc Feedback
Hey @jeffchuber, I'm just going to throw a lot of random feedback at you mainly from a JS/TS dev's perspective focused on DX.
Feel free to ignore my suggestions – I just want to try and give you an honest brain dump that will hopefully be helpful. The more important stuff is bolded.
JS/TS SDK
- remove
axiosas a dependency;fetchis a web standard and is the default for node.js >= 18 and all serverless/edge environments. here is a good resource that I've linked several other repos to for how to handlefetchas a library author that is future proof for where the node.js community is headed - add support for keyword-based arguments
- this isn't just a subjective thing; most quality JS/TS packages avoid positional arguments except for really simple use cases because the added verbosity and typing tends to make function invocations much clearer)
- if I have to pass an
undefinedas a positional argument, there's something wrong w/ the SDK 😉 - (super nit) pretty sure
metadatais already plural, sometadatasis a lil awkward
e.g.,
# python
collection.add(
embeddings=[[1.1, 2.3, 3.2], [4.5, 6.9, 4.4], [1.1, 2.3, 3.2], ...
metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}, ...],
ids=["id1", "id2", "id3", ...]
)
# TS
await collection.add(
["id1", "id2", "id3", ...],
[[1.1, 2.3, 3.2], [4.5, 6.9, 4.4], [1.1, 2.3, 3.2], ...],
[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}, ...],
)
- make it clear that the
addfunction upserts documents instead of strictly creating new documents - I would've expected
addto take in a single typed array, with an optional second argument for optional params - allow
getCollectionto support an optional typescript generic so all collection methods are properly typed
Some example TS pseudocode that shows an idealized version of the package in action:
import { ChromaClient } from 'chromadb'
interface MyMetadata { foo: string; bar: number }
const client = new ChromaClient()
const collection = await client.getCollection<MyMetadata>({
indexName: "my_collection",
embeddingSize: 1536,
embeddingFunction: () => { ... }
})
// now collection is properly typed for my particular documents
await collection.upsert([
{
id: 'foo-1',
embedding: [1.1, 2.3, 3.2],
metadata: { foo: 'foo', bar: 5 }
},
// ...
])
// or if you want to pass documents instead of embeddings
await collection.upsertDocuments([
{
id: 'foo-1',
document: "example text",
metadata: { foo: 'foo', bar: 5 }
},
// ...
])
// ideally these upsert / add methods would automatically batch things for you with a configurable batchSize
Here's a solid Pinecone TS wrapper by @rileytomasek which does a lot of similar things: https://github.com/rileytomasek/pinecone-client
Docs
- some of the JS examples still use python (e.g.,
let collection2 = await client.getCollection("my_collection", embedding_function=emb_fn)) - in the JS examples, prefer ESM
importvs commonjsrequirewhich are more or less deprecated going forwards (e.g., the embeddings page) - some of the JS examples use
snake_casewhich is very pythonic and 99% of the JS/TS community usescamelCase(api_keyfor instance) - it's currently really unclear how auth is handled connecting to the db; I would expect to be able to pass an api key to the
ChromaClientconstructor, but I'm guessing this feature doesn't exist yet? maybe add a note to the docs for now? - in general, I find the handling of documents vs embeddings to be a little clunky right now. e.g., trying to support both with the same functions really doesn't make sense to me since afaict they seem to be mutually exclusive.
- I'd consider making
collection.query({ embeddings: [...] })separate fromcollection.queryDocuments({ documents: [...] })or something to make the difference explicit, since right now it's confusing that I could pass both and unclear which one would actually be used - same with
wherevswhere_document– I like that it's explicit here about the difference - but in general, the concept of a document versus metadata seems a bit conflated. in mongodb, for instance (as I'm sure you're well aware), the document is the entire record, not just the part it's indexed based on. whereas in chroma, the document is like a shorthand for an implicit pre-embedding but the metadata not being part of the document is a bit weird. sorry I know this isn't the best explanation, but getting your core public abstractions to make sense at this early stage is imho really critical
- I'd consider making
- add cmd+k search to docs 😄
- is there an AWS AMI for chroma? I hate cloudformation, but I'd gladly self-host on AWS if I had a one-click install via the AMI marketplace
Hope this feedback is helpful && again, feel free to ignore & prioritize as you wish 🙏
@transitive-bullshit this is really great, thanks for writing this up!
Here is a quick set of reactions
remove axios
🫡 great!
keyword based arguments
we have been discussing either moving to, or also supporting row-based insert in addition to column-based insert. This because the ergonomics are much more natural for a non-data-science and app-dev focused use case. This +1s that!
nits on the docs
really great, thank you, we will update
auth
right now chroma does not handle auth. What do you think is best to do here? Have "db-like" auth similar to how you would do something with postgres?
query being overloaded
that's really interesting feedback... i hadnt heard that before! going to need to think about that. :)
AMI
none yet! but after the refactor, https://github.com/chroma-core/chroma/pull/214, lands - something we could def do!
@transitive-bullshit do you think its better to support named and positional args (non-breaking change) or hard break change over to named? Not sure what the std convention is here in the JS world.
@transitive-bullshit do you think its better to support named and positional args (non-breaking change) or hard break change over to named? Not sure what the std convention is here in the JS world.
Ideally, the only version you'd support would be named args. It's simpler, and at imho at this stage of your roadmap, this level of breaking change will be welcomed by JS/TS devs as long as it's communicated properly.
Positional args for anything more than one or maybe 2 positions in JS/TS are pretty frowned upon imho. Feel free to double check w/ @sw-yx
I agree with @transitive-bullshit. Named args will also make it easier to avoid future breaking changes or awkward function calls.
@transitive-bullshit @rileytomasek @sw-yx named args it is! thanks everyone :)
ONE ARG TO RULE THEM
AND IN DARKNESS BIND THEM
Implemented basically all of this! Thanks again @transitive-bullshit for the awesome issue 🎉