datalee issues

Results 14 issues of


                                            datalee

pattern too large (compile failed)

re2_match_all(text,pattern_name,parallel = T) pattern_name is name of length 60000,via"|"paste. such as:嘉兴市鑫港房地产|某某某|你开呀 i want to find if the "text" contain of the name, and which the name the "text" contain, thks...

怎么实现模型的热更新？

不需要重启服务

can support the function like "lime_text" from python?

hey,in the python,it like "from lime.lime_text import LimeTextExplainer", thks.

[Question]: 为什么没把ERNIE-Search放进来？

### 请提出你的问题 ![image](https://user-images.githubusercontent.com/13651873/193243922-d17806da-1e13-4ac1-8efb-b94fb438bd08.png)

question

Support for chatglm-6b

It would be great if you could support chatglm-6b，It's a popular chinese model。 https://huggingface.co/THUDM/chatglm-6b

new model

Error in textConnection(message) : invalid 'text' argument

when i use the function:dbWriteTable to writetable into sqlserver,there is something wrong with it: 1.write the data less than 999 rows,it's ok,not the error :`Error in textConnection(message) : invalid 'text'...

[2017-07-10 11:37:44] failure_stack_traces: java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: Could not initialize class deepwater.backends.mxnet.MXNetBackend$MXNetLoader

[2017-07-10 11:37:44] failure_details: Unable to initialize the native Deep Learning backend: null [2017-07-10 11:37:44] failure_stack_traces: java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: null at hex.deepwater.DeepWaterModelInfo.setupNativeBackend(DeepWaterModelInfo.java:267) at hex.deepwater.DeepWaterModelInfo.(DeepWaterModelInfo.java:214) at...

the loss is different from rasa?

MarginRankingLoss or CrossEntropyLoss？

怎么在gpu上训练

默认是cpu，需要特殊设置环境？

有1说1，效果一般，表格没识别出来，还丢内容

测试用例：[RAG 工业落地方案框架（Qanything、RAGFlow、FastGPT、智谱RAG）细节比对](https://mp.weixin.qq.com/s/z8CcFi03kQMGoEEQbuHzxw) [`识别结果：`](https://r.jina.ai/https://mp.weixin.qq.com/s/z8CcFi03kQMGoEEQbuHzxw) > Title: RAG 工业落地方案框架（Qanything、RAGFlow、FastGPT、智谱RAG）细节比对 URL Source: https://mp.weixin.qq.com/s/z8CcFi03kQMGoEEQbuHzxw Markdown Content: | 召回模块 | 向量库采用milvus的混合检索（BM25+向量检索），不设置阈值，返回topk（100） | 向量数据库使用的是 ElasticSearch。混合检索，实现的是文本检索 + 向量检索，没有指定具体的向量模型，但是使用huqie作为文本检索的分词器 | 语义检索语义检索模式通过先进的向量模型技术，将知识库中的数据集转换成高维向量空间中的点。在这个空间中，每个文档或数据项都被表示为一个向量，这些向量能够捕捉到数据的语义信息。当用户提出查询时，系统同样将问题转化为向量，并在向量空间中与知识库中的向量进行相似度计算，以找到最相关的结果。优势：能够理解并捕捉查询的深层含义，提供更加精准的搜索结果。应用场景：适用于需要深度语义理解和复杂查询处理的情况，如学术研究、技术问题解答等。技术实现：利用如text-embedding-ada-002等模型，对文本数据进行embedding，实现高效的语义匹配。全文检索全文检索模式侧重于对文档的全文内容进行索引，允许用户通过输入关键词来检索文档。这种模式通过分析文档中的每个词项，并建立一个包含所有文档的索引数据库，使用户可以通过任何一个词或短语快速找到相关的文档。优势：检索速度快，能够对大量文档进行广泛的搜索，方便用户快速定位到包含特定词汇的文档。...