Simon Willison
Simon Willison
Just found myself needing this exact feature - I want ULIDs created within the same ms to still sort after each other. Looks like https://github.com/itsrainingmani/py-ulid has a version of this:...
Here's the official spec for how this should work: https://github.com/ulid/spec/blob/master/README.md#monotonicity
This is a really interesting idea - why shouldn't `llm embed` accept URLs? One option could be to directly port over the existing "fragments" concept to `llm-embed` - so anything...
I'd welcome a protoype of `llm embed -f ... -f ...` that emulates the existing prompt fragments mechanism, complete with plugin support etc. It may be as simple as copying...
The purpose of the `id` is so you can say e.g. "find related content to ID 5" and get back a list of IDs that you can then lookup in...
That existing `content_hash` column exists to help us avoid re-embedding content that we have already stored a vector for, since API calls to embed content have a cost.
I'm not convinced it's worth having a `fragment_id` column on `embeddings` to reference a fragment rather than dumping that content in the existing `content` or `content_blob` columns. The embeddings database...
I think the simplest version of this just adds support for a _single_ `-f/--fragment` option to `llm embed`, uses the existing `resolve_fragments()` function to resolve that (which handles URLs and...
@daturkel absolutely go ahead and have a go at this. And since you're spending a bunch of time in embeddings at the moment, I'd love to get your thoughts on...
There's an old issue for this: - #228 I agree this needs to be more clear in the documentation.