Architectural/Inheritance Issues on serialize/deserialize methods
Hello, first of all. I really enjoy this library and love using it. I totally understand this library is still in development. I really look forward to seeing where this project goes. Sincerely, keep up the great work. That said, I think its worth making a couple of breaking changes. I'm totally willing to donate my time to make these changes, but before I make breaking changes, I wanted to open a discussion for feedback.
My use case, I was attempting to write my own SOQL chain to fetch data from Salesforce. I was going to implement the same pattern as SQL, and have a database driver to run the SOQL query using the SF API. Given the nature of SF, and there is no schema (in the same way SQL has it), so a lot of this is totally custom implementation. Perhaps this driver doesn't even belong in the core Langchain repo (I hadn't planned on making a pull request for it). But therein lies my challenges.
Problem 1.
To create a new chain I assume would be to create a new class like SoqlDatabaseChain extends BaseChain and implement the abstract methods. However, serialize(): SerializedBaseChain is defined in BaseChain. SerializedBaseChain is a union type, and the only way to add to it would be to modify the core code.
export type SerializedBaseChain =
| SerializedLLMChain
| SerializedSimpleSequentialChain
| SerializedVectorDBQAChain
| SerializedStuffDocumentsChain
| SerializedSqlDatabaseChain
| SerializedChatVectorDBQAChain
| SerializedMapReduceDocumentsChain
| SerializedAnalyzeDocumentChain
| SerializedRefineDocumentsChain;
So my first thought was to just convert the types to interfaces and have them all extend the SerializedBaseChain. This largely works, except 4 tests fail due to the ensuing type errors. As far as I can tell, the only thing in common is the _type prop. I don't even like this approach tough, this really isn't properly typed either. But for the sake of just demonstrating the problem, and providing a workable solution without re-writing a bunch of code, here it is:
// Note, all the SerializedBaseChain types could follow this format:
export interface SerializedAnalyzeDocumentChain extends SerializedBaseChain {
_type: "analyze_document_chain";
combine_document_chain?: SerializedBaseChain;
}
export interface SerializedBaseChain {
_type: string;
}
Problem 2.
The BaseChain class has the deserialize() method, which has a switch statement. This will throw the following type error for all the data variables with incorrect types:
Argument of type 'SerializedBaseChain' is not assignable to parameter of type 'SerializedLLMChain'.
Types of property '_type' are incompatible.
Type 'string' is not assignable to type '"llm_chain"'.
The following code will fix the type errors, at the cost of type casting. This is the path to least resistance, but it's not ideal in my opinion. The tests do pass though...
/**
* Load a chain from a json-like object describing it.
*/
static async deserialize(
data: SerializedBaseChain,
values: LoadValues = {}
): Promise<BaseChain> {
switch (data._type) {
case "llm_chain": {
const { LLMChain } = await import("./llm_chain.js");
return LLMChain.deserialize(data as SerializedLLMChain);
}
case "simple_sequential_chain": {
const { SimpleSequentialChain } = await import(
"./simple_sequential_chain.js"
);
return SimpleSequentialChain.deserialize(data as SerializedSimpleSequentialChain);
}
case "stuff_documents_chain": {
const { StuffDocumentsChain } = await import("./combine_docs_chain.js");
return StuffDocumentsChain.deserialize(data as SerializedStuffDocumentsChain);
}
case "map_reduce_documents_chain": {
const { MapReduceDocumentsChain } = await import(
"./combine_docs_chain.js"
);
return MapReduceDocumentsChain.deserialize(data as SerializedMapReduceDocumentsChain);
}
case "refine_documents_chain": {
const { RefineDocumentsChain } = await import(
"./combine_docs_chain.js"
);
return RefineDocumentsChain.deserialize(data as SerializedRefineDocumentsChain);
}
case "vector_db_qa": {
const { VectorDBQAChain } = await import("./vector_db_qa.js");
return VectorDBQAChain.deserialize(data as SerializedVectorDBQAChain, values);
}
default:
throw new Error(
`Invalid prompt type in config: ${
(data as SerializedBaseChain)._type
}`
);
}
}
Another Solution:
I think there are two fundamental issues with deserialize(). The BaseChain has a different definition that how many other classes extend and implement it. Furthermore, BaseChain is abstract, so its not like you can call BaseChain.deserialize(). Give how some classes use LoadValues and other do not, I really think it should be up to each subclass to implement deserialize().
// the LLMChain definition
static async deserialize(data: SerializedLLMChain);
// the BaseClass definition
static async deserialize(data: SerializedLLMChain, values: LoadValues = {});
I propose the BaseChain defines this class as follows (if it must be defined at all), and throw a runtime exception if the subclass has not implemented it yet. There is actually a strong case to be made for not even defining it in BaseChain, because RetrievalQAChain doesn't even implement it, and will throw a runtime error. Another benefit to removing this method, it also will remove the switch statement that doesn't really add any value here.
/**
* Load a chain from a json-like object describing it.
*/
static async deserialize(
data: SerializedBaseChain
): Promise<BaseChain> {
throw new Error('deserialize() has not been implemented for this class.');
}
In summary, I think in order to use this library properly at scale, it needs to be properly extensible. And defining a union type, which is then forced through the core library is not a good approach. Hopefully this all makes sense. I'm interested to hear the community's feedback.
@actengage thanks a lot for this. totally agree with this. we plan to remove serialise/deserialise methods from the base classes for exactly this reason. The plan is to offer a single function in the library that can serialise / deserialise any langchain object (a similar api to JSON.parse / JSON.stringify).
For now, anything I can do to unblock you here?
Thank you for the quick reply. We are actively working on a production app. I feel like the lainchain library is so valuable, I'm willing to accept any breaking changes that may arise as a result of using a pre-production library. That said, until these changes in the abstraction are made, I have to either use my fork and hack the union type to add my SoqlDatabaseChain, or just wait to implement SOQL and work on other aspects of the app (which is fine). Do you have an ETA on when these core changes could be made?
As a side note, i've been having so much fun with this library, I think I've worked on it every day for the last week until the AM. To the point I'm so stoked about it, i can't even sleep. I'd love to contribute if/when possible. Let me know if I can be any assistance. I'm the senior dev at my company, 20 years of experience.
Thanks a lot for the feedback here @actengage very helpful. I've merged a change that makes the serialise method no longer an abstract method on the base class, but instead with a default implementation that just throws a not implemented error (as a stopgap solution until we finish the work on this).
Great to hear you're enjoying langchain! We're always open to any contributions from the community, be it adding new features or fixing bugs, or just triaging issues, whatever you feel like contributing would be very welcome, let me know if you want any specific pointers
Hi, @actengage! I'm here to help the LangChain team manage their backlog and I wanted to let you know that we are marking this issue as stale.
Based on my understanding of the current state of the issue, you suggested breaking changes to address architectural and inheritance issues with the serialize/deserialize methods in the library. The maintainer, @nfcampos, agrees with the proposed changes and plans to remove the serialize/deserialize methods from the base classes. They have also made a temporary change to the serialize method. They mentioned that they are open to contributions from the community, and you expressed your willingness to accept any breaking changes and offer to contribute to the library.
Before we proceed, we would like to confirm if this issue is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.
Thank you for your understanding and contribution to the LangChain project!