Python 写入elasticsearch

Open heidsoft opened this issue 2 years ago • 0 comments

以下是一个简单的Elasticsearch使用Python向索引中写入数据的示例。

首先，你需要在你的系统中安装Elasticsearch和Python Elasticsearch客户端。以下是安装Python Elasticsearch客户端的命令：

pip install elasticsearch

然后，你可以使用以下Python代码向Elasticsearch索引中写入数据：

from datetime import datetime
from elasticsearch import Elasticsearch

# 通过默认方式连接到本地的elasticsearch
es = Elasticsearch()

# 创建一个索引，名为test-index
es.indices.create(index='test-index', ignore=400)

# 定义要索引的文档
doc = {
    'author': 'kimchy',
    'text': 'Elasticsearch: cool. bonsai cool.',
    'timestamp': datetime.now(),
}

# 索引这个文档到test-index
res = es.index(index="test-index", id=1, body=doc)
print(res['result'])

# 获取这个文档
res = es.get(index="test-index", id=1)
print(res['_source'])

这段代码首先创建一个Elasticsearch对象，然后创建一个名为test-index的索引。然后，它创建一个文档，这个文档是一个Python字典，包含author，text和timestamp字段。然后，它将这个文档索引到test-index中，并打印出索引操作的结果。最后，它获取并打印出索引的文档。

在运行此代码之前，请确保你的Elasticsearch服务正在运行，并且可以在你的Python代码中指定的端口上访问到。

在Elasticsearch中，每个文档都有一个唯一的_id字段，你可以将其视为主键。当你索引一个文档时，可以选择提供一个_id。如果你不提供_id，Elasticsearch会自动生成一个。

当你尝试使用已经存在的_id索引一个新文档时，旧文档将被新文档替换。这就是所谓的"upsert"操作，即更新或插入。

这是一个例子：

from elasticsearch import Elasticsearch

es = Elasticsearch()

doc = {
    'author': 'kimchy',
    'text': 'Elasticsearch: cool. bonsai cool.',
}

# 使用自定义的_id
es.index(index="test-index", id="my_unique_id", body=doc)

# 使用相同的_id将会覆盖旧文档
doc = {
    'author': 'another_author',
    'text': 'Another text',
}

es.index(index="test-index", id="my_unique_id", body=doc)

res = es.get(index="test-index", id="my_unique_id")
print(res['_source'])  # 输出：{'author': 'another_author', 'text': 'Another text'}

在这个例子中，我们首先索引一个文档，并提供一个自定义的_id "my_unique_id"。然后，我们尝试使用相同的_id索引一个新文档。旧文档被新文档替换了。

Python的Elasticsearch客户端提供了一个bulk方法，可以使用它来批量索引/更新/删除操作。下面是一个示例：

from elasticsearch import Elasticsearch, helpers

es = Elasticsearch()

# 创建一个包含多个字典的列表，每个字典代表一个文档
actions = [
    {
        "_index": "tickets-index",
        "_type": "_doc",
        "_id": j,
        "_source": {
            "any": "data" + str(j),
            "timestamp": datetime.now()}
    }
    for j in range(0, 10)
]

# 使用bulk方法批量索引
helpers.bulk(es, actions)

在这个示例中，我们首先创建了一个名为actions的列表，其中包含10个字典，每个字典代表一个文档。然后我们调用了helpers.bulk方法来进行批量索引操作。

每个动作都是一个字典，包含以下关键字：

_index: 索引名称。
_type: 文档类型。在Elasticsearch 7.0及以后的版本中，这个字段是可选的，可以设置为_doc。
_id: 文档ID，如果不提供，Elasticsearch将生成一个。
_source: 文档的主体内容，为一个字典。

注意：在一个bulk操作中，最好不要索引太多文档。一般来说，一次性索引几千个文档是可以的。如果你需要索引更多文档，你应该将它们分批索引。否则，可能会因为内存不足而导致问题。

Dec 11 '23 03:12 heidsoft