Langchain-Chatchat 能否指定自己训练的text2vector模型？

请问大佬： 1、我能否在这套框架内使用自己训练的文本向量化模型来替代默认的text2vec-large-chinese？这将大大提高在我特定领域内知识库搜索的精准度 2、能否自定义我自己的相似度计算公式？替换原有的比如inner product？

感谢

Apr 25 '23 06:04 dingle0422

可以，目前框架中embedding加载使用langchain中HuggingFaceEmbedding类，可以按照embedding的形式存储自己的embedding模型。自己训练的embedding确实对检索效果会有很大提升。

需要重新定义FAISS中的similarity search相关函数，重新定义后可进行计算。

Le Ding @.***>于2023年4月25日周二14:42写道：

请问大佬：

1、我能否在这套框架内使用自己训练的文本向量化模型来替代默认的text2vec-large-chinese？这将大大提高在我特定领域内知识库搜索的精准度 2、能否自定义我自己的相似度计算公式？替换原有的比如inner product？

感谢

— Reply to this email directly, view it on GitHub https://github.com/imClumsyPanda/langchain-ChatGLM/issues/172, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLH5EQFQ7F2BJY2GZR2SJLXC5W5BANCNFSM6AAAAAAXKRKXH4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Apr 25 '23 07:04 imClumsyPanda

可以，目前框架中embedding加载使用langchain中HuggingFaceEmbedding类，可以按照embedding的形式存储自己的embedding模型。自己训练的embedding确实对检索效果会有很大提升。 2. 需要重新定义FAISS中的similarity search相关函数，重新定义后可进行计算。 Le Ding @.>于2023年4月25日周二14:42写道： … 请问大佬： 1、我能否在这套框架内使用自己训练的文本向量化模型来替代默认的text2vec-large-chinese？这将大大提高在我特定领域内知识库搜索的精准度 2、能否自定义我自己的相似度计算公式？替换原有的比如inner product？感谢 — Reply to this email directly, view it on GitHub <#172>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLH5EQFQ7F2BJY2GZR2SJLXC5W5BANCNFSM6AAAAAAXKRKXH4 . You are receiving this because you are subscribed to this thread.Message ID: @.>

感谢大佬解答

Apr 25 '23 07:04 dingle0422

有问题欢迎一起讨论交流，另外如果有方案成功实现，也欢迎提PR

Le Ding @.***>于2023年4月25日周二15:18写道：

可以，目前框架中embedding加载使用langchain中HuggingFaceEmbedding类，可以按照embedding的形式存储自己的embedding模型。自己训练的embedding确实对检索效果会有很大提升。

需要重新定义FAISS中的similarity search相关函数，重新定义后可进行计算。 Le Ding @.

>于2023年4月25日周二14:42写道： … <#m_-4054194038942192777_> 请问大佬： 1、我能否在这套框架内使用自己训练的文本向量化模型来替代默认的text2vec-large-chinese？这将大大提高在我特定领域内知识库搜索的精准度 2、能否自定义我自己的相似度计算公式？替换原有的比如inner product？感谢 — Reply to this email directly, view it on GitHub <#172 https://github.com/imClumsyPanda/langchain-ChatGLM/issues/172>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLH5EQFQ7F2BJY2GZR2SJLXC5W5BANCNFSM6AAAAAAXKRKXH4 https://github.com/notifications/unsubscribe-auth/ABLH5EQFQ7F2BJY2GZR2SJLXC5W5BANCNFSM6AAAAAAXKRKXH4 . You are receiving this because you are subscribed to this thread.Message ID: @.>

感谢大佬解答

— Reply to this email directly, view it on GitHub https://github.com/imClumsyPanda/langchain-ChatGLM/issues/172#issuecomment-1521272957, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLH5EXNIXBPY742ZRT2UO3XC53DDANCNFSM6AAAAAAXKRKXH4 . You are receiving this because you commented.Message ID: @.***>

Apr 25 '23 07:04 imClumsyPanda

@imClumsyPanda 大佬，我又来了，关于Faiss进行搜索这事（上方第2个问题），是否能够用我自己写的索引模型去替代Faiss呢？因为我similarity计算方式比较特殊。经尝试，通过改造Faiss的index.search函数来实现搜索的速度非常低效，不如自己写的方式快。劳烦大佬解答感谢。

Apr 27 '23 02:04 dingle0422

当然可以，也欢迎有更高效的方式可以提PR

最新版项目中也弃用了RetrievalQA的方式，自己写的realted_doc到response的过程

Le Ding @.***>于2023年4月27日周四10:48写道：

@imClumsyPanda https://github.com/imClumsyPanda

大佬，我又来了，关于Faiss进行搜索这事（上方第2个问题），是否能够用我自己写的索引模型去替代Faiss呢？因为我similarity计算方式比较特殊。经尝试，通过改造Faiss的index.search函数来实现搜索的速度非常低效，不如自己写的方式快。劳烦大佬解答感谢。

— Reply to this email directly, view it on GitHub https://github.com/imClumsyPanda/langchain-ChatGLM/issues/172#issuecomment-1524529250, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLH5ETUYK2JLE4JCSE7WH3XDHNBJANCNFSM6AAAAAAXKRKXH4 . You are receiving this because you were mentioned.Message ID: @.***>

Apr 27 '23 03:04 imClumsyPanda

可以，目前框架中embedding加载使用langchain中HuggingFaceEmbedding类，可以按照embedding的形式存储自己的embedding模型。自己训练的embedding确实对检索效果会有很大提升。 2. 需要重新定义FAISS中的similarity search相关函数，重新定义后可进行计算。 Le Ding @.>于2023年4月25日周二14:42写道： … 请问大佬： 1、我能否在这套框架内使用自己训练的文本向量化模型来替代默认的text2vec-large-chinese？这将大大提高在我特定领域内知识库搜索的精准度 2、能否自定义我自己的相似度计算公式？替换原有的比如inner product？感谢 — Reply to this email directly, view it on GitHub <#172>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLH5EQFQ7F2BJY2GZR2SJLXC5W5BANCNFSM6AAAAAAXKRKXH4 . You are receiving this because you are subscribed to this thread.Message ID: @.>

大佬，请问替换自己的embedding模型，直接在config的embedding_model_dict里面添加就好了吗。第二个就是，如果自定义相似度计算，是要修改哪个地方啊

Jul 19 '23 08:07 wjwzju

可以，目前框架中embedding加载使用langchain中HuggingFaceEmbedding类，可以按照embedding的形式存储自己的embedding模型。自己训练的embedding确实对检索效果会有很大提升。 2. 需要重新定义FAISS中的similarity search相关函数，重新定义后可进行计算。 Le Ding @.>于2023年4月25日周二14:42写道： … 请问大佬： 1、我能否在这套框架内使用自己训练的文本向量化模型来替代默认的text2vec-large-chinese？这将大大提高在我特定领域内知识库搜索的精准度 2、能否自定义我自己的相似度计算公式？替换原有的比如inner product？感谢 — Reply to this email directly, view it on GitHub <能否指定自己训练的text2vector模型？ #172>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLH5EQFQ7F2BJY2GZR2SJLXC5W5BANCNFSM6AAAAAAXKRKXH4 . You are receiving this because you are subscribed to this thread.Message ID: _@**.**_>

大佬，请问替换自己的embedding模型，直接在config的embedding_model_dict里面添加就好了吗。第二个就是，如果自定义相似度计算，是要修改哪个地方啊

+1 同问 @imClumsyPanda

Jul 28 '23 02:07 tomFoxxxx

满足huggingfaceembedding加载方式的模型，直接替换就行，相似度算法在向量库的类定义里

Jul 28 '23 02:07 imClumsyPanda

满足huggingfaceembedding加载方式的模型，直接替换就行，相似度算法在向量库的类定义里

谢谢大佬！

Jul 28 '23 03:07 tomFoxxxx

满足huggingfaceembedding加载方式的模型有哪些可以用来微调的呢？大佬可以给几个参考一下吗？

Jul 28 '23 14:07 Yyy11181

满足huggingfaceembedding加载方式的模型，直接替换就行，相似度算法在向量库的类定义里

大佬我已经更换了自定义的模型替换后可以init_knowledge_base，也可以召回结果，供大模型生产答案，但是发现无法生成召回的score，所有的score均为0，可以麻烦您解答一下吗，感谢。

Jul 29 '23 02:07 tomFoxxxx

满足huggingfaceembedding加载方式的模型，直接替换就行，相似度算法在向量库的类定义里

大佬我已经更换了自定义的模型替换后可以init_knowledge_base，也可以召回结果，供大模型生产答案，但是发现无法生成召回的score，所有的score均为0，可以麻烦您解答一下吗，感谢。

@imClumsyPanda 麻烦您啦

Jul 29 '23 02:07 tomFoxxxx

Langchain-Chatchat Langchain-Chatchat copied to clipboard

能否指定自己训练的text2vector模型？

Langchain-Chatchat
Langchain-Chatchat copied to clipboard