Awesome-TTRSS icon indicating copy to clipboard operation
Awesome-TTRSS copied to clipboard

添加 postgresql 中文全文搜索 zhparser/jieba/pgroonga

Open davidlauhn opened this issue 5 years ago • 22 comments

postgresql自带的搜索不支持中文,导致ttrss搜索中文的根本没法用,不知道有没有计划添加 zhparser/jieba/pgrooga之类的?

davidlauhn avatar Aug 01 '19 09:08 davidlauhn

没有这个计划,看起来需要更改 TTRSS 的搜索逻辑,mysql 有这个问题吗?

HenryQW avatar Aug 06 '19 09:08 HenryQW

推荐通过阅读器来实现全文搜索,比如 Reeder

HenryQW avatar Aug 06 '19 09:08 HenryQW

mysql没有试过哦,我自己慢慢试试看,谢谢

davidlauhn avatar Aug 07 '19 00:08 davidlauhn

https://discourse.tt-rss.org/t/solved-search-in-chinese/2241/2 如果没有理解错其实tinytinyrss已经支持,只要配置好pgrooga就能设置全局搜索的默认语言了?

jostyee avatar Aug 16 '19 06:08 jostyee

https://discourse.tt-rss.org/t/solved-search-in-chinese/2241/2 如果没有理解错其实tinytinyrss已经支持,只要配置好pgrooga就能设置全局搜索的默认语言了?

太菜,搞不懂pgrooga怎么配置,然后用zhparser实现了

davidlauhn avatar Aug 16 '19 11:08 davidlauhn

@davidlauhn 可以分享一下解决方案,我看看能不能加进去。或者直接 PR 就完美了!

HenryQW avatar Aug 16 '19 20:08 HenryQW

@jostyee 没看懂 DEFAULT_SEARCH_LANGUAGE 的用法,我试了下那个贴里的办法还是不行。

HenryQW avatar Aug 16 '19 20:08 HenryQW

@HenryQW 本人非码农/非运维,以下全部基于copy/paste,只知然,不知所以然,而且不一定准确,没法接受提问,因为真的不懂,抱歉 :-)

修改了两个 docker image

docker-compose.yml

services:
  database.postgres:
    image: davidlauhn/postgres-11-with-zhparser:latest
    container_name: postgres
    environment:
      - PG_PASSWORD=password # please change the password
      - DB_EXTENSION=pg_trgm
    volumes:
      - ~/postgres/data/:/var/lib/postgresql/ # persist postgres data to ~/postgres/data/ on the host
    restart: always

  service.rss:
    image: davidlauhn/awesome-ttrss:latest
    container_name: ttrss
    ports:
      - 80:80
    environment:
      - SELF_URL_PATH=http://domain.name/ # please change to your own domain
      - DB_HOST=database.postgres
      - DB_PORT=5432
      - DB_NAME=ttrss
      - DB_USER=postgres
      - DB_PASS=password # please change the password
      - ENABLE_PLUGINS=auth_internal, fever # auth_internal is required. Plugins enabled here will be enabled for all users as system plugins
    stdin_open: true
    tty: true
    restart: always
    command: sh -c 'sh /wait-for.sh database.postgres:5432 -- php /configure-db.php && exec s6-svscan /etc/s6/'

  service.mercury: # set Mercury Parser API endpoint to `service.mercury:3000` on TTRSS plugin setting page
    image: wangqiru/mercury-parser-api:latest
    container_name: mercury
    expose:
      - 3000
    restart: always

    service.opencc: # set OpenCC API endpoint to `service.opencc:3000` on TTRSS plugin setting page
    image: wangqiru/opencc-api-server:latest
    container_name: opencc
    environment:
      NODE_ENV: production
    expose:
      - 3000
    restart: always

然后配置一下zhparser

    docker exec -it postgres /bin/sh
    psql -U postgres -d ttrss -c 'CREATE EXTENSION zhparser'
    psql ttrss postgres -c 'CREATE TEXT SEARCH CONFIGURATION Chinese (PARSER = zhparser)'
    psql ttrss postgres -c 'ALTER TEXT SEARCH CONFIGURATION Chinese ADD MAPPING FOR n,v,a,i,e,l WITH simple'
    psql ttrss postgres
    update ttrss_entries set tsvector_combined = to_tsvector('Chinese', content);

重启一下postgresql,更改ttrss的搜索语言为Chinese即可。

中文搜索堪用,但貌似分词稍微有点点小问题,zhparser会把长词拆成短词匹配,应该是zhparser默认的配置还需要调教,因我要求也不高,所以将就着用了

davidlauhn avatar Aug 17 '19 02:08 davidlauhn

@davidlauhn 启用zhparser没那么麻烦,sameersbn/postgresql 支持通过env开启的:

https://github.com/sameersbn/docker-postgresql#enabling-extensions

jostyee avatar Aug 17 '19 08:08 jostyee

@jostyee 我也不想这么大费周章,可不懂嘛,所以就跟着说明一步步走咯 :-)

davidlauhn avatar Aug 17 '19 11:08 davidlauhn

@jostyee zhparser 还需要装依赖的,不能直接开启

HenryQW avatar Aug 17 '19 15:08 HenryQW

This issue has been automatically marked as stale because it has not had recent activity in 14 days. It will be closed if no further activity occurs in 7 days. Thank you for your contributions.

stale[bot] avatar Sep 12 '19 21:09 stale[bot]

有空调查一下可行性。欢迎大佬 PR!

HenryQW avatar Dec 20 '19 03:12 HenryQW

This issue has been automatically marked as stale because it has not had recent activity in 14 days. It will be closed if no further activity occurs in 7 days. Thank you for your contributions.

stale[bot] avatar Jan 17 '20 10:01 stale[bot]

简单地改了版,有兴趣的可以试用下

postgres镜像 hoilc/postgres-chinese-textsearch:latest

ttrss镜像 hoilc/ttrss:latest, 需要添加环境变量TEXTSEARCH_EXTENSION=pg_jieba,zhparser

https://github.com/hoilc/Awesome-TTRSS/blob/master/docker-compose.yml

hoilc avatar Feb 09 '20 12:02 hoilc

简单地改了版,有兴趣的可以试用下

postgres镜像 hoilc/postgres-chinese-textsearch:latest

ttrss镜像 hoilc/ttrss:latest, 需要添加环境变量TEXTSEARCH_EXTENSION=pg_jieba,zhparser

https://github.com/hoilc/Awesome-TTRSS/blob/master/docker-compose.yml

@HenryQW 这个好用的话可以合并过来,ttrss的中文搜索的确不行

ptsa avatar May 30 '20 06:05 ptsa

PR 一下嘛?我最近太忙了

HenryQW avatar May 30 '20 06:05 HenryQW

@hoilc 提交下pr @HenryQW 他这个postgresql 也有改 你要fork 下他的postgresql 吧 https://github.com/hoilc/postgres-chinese-textsearch

ptsa avatar May 30 '20 07:05 ptsa

@hoilc 没有提交pr 我复制了他的代码 提交了 pr

ptsa avatar Jun 30 '20 14:06 ptsa

请问这个修改汇到latest没有?我尝试搜索中文还是没成功

0rt avatar Apr 01 '21 16:04 0rt

@0rt 我提交没成功。可能方法没对

ptsa avatar Apr 02 '21 02:04 ptsa

调试了一个最新版的 postgres-chinese-textsearch

postgres-chinese-textsearch https://hub.docker.com/r/bloodstar/postgres-chinese-textsearch

version: "3"
services:
  service.rss:
    image: bloodstar/ttrss:latest
    container_name: ttrss
    ports:
      - 181:80
    environment:
      - SELF_URL_PATH=http://localhost:181/ # please change to your own domain
      - DB_HOST=database.postgres
      - DB_PORT=5432
      - DB_NAME=ttrss
      - DB_USER=postgres
      - DB_PASS=ttrss # please change the password
      - PUID=1000
      - PGID=1000
      - TEXTSEARCH_EXTENSION=pg_jieba # add support for chinese fulltext search (pg_jieba, zhparser, or both two)
    volumes:
      - feed-icons:/var/www/feed-icons/
    networks:
      - public_access
      - service_only
      - database_only
    stdin_open: true
    tty: true
    restart: always

  service.mercury: # set Mercury Parser API endpoint to `service.mercury:3000` on TTRSS plugin setting page
    image: wangqiru/mercury-parser-api:latest
    container_name: mercury
    networks:
      - public_access
      - service_only
    restart: always

  service.opencc: # set OpenCC API endpoint to `service.opencc:3000` on TTRSS plugin setting page
    image: wangqiru/opencc-api-server:latest
    container_name: opencc
    environment:
      - NODE_ENV=production
    networks:
      - service_only
    restart: always

  # database.postgres:
  #   image: postgres:13-alpine
  #   container_name: postgres
  #   environment:
  #     - POSTGRES_PASSWORD=ttrss # feel free to change the password
  #   volumes:
  #     - ~/postgres/data/:/var/lib/postgresql/data # persist postgres data to ~/postgres/data/ on the host
  #   networks:
  #     - database_only
  #   restart: always

  database.postgres:
    image: bloodstar/postgres-chinese-textsearch:latest
    container_name: postgres
    environment:
      - POSTGRES_PASSWORD=ttrss # please change the password
    volumes:
      - ~/postgres/data/:/var/lib/postgresql/data # persist postgres data to ~/postgres/data/ on the host
    restart: always

  # utility.watchtower:
  #   container_name: watchtower
  #   image: containrrr/watchtower:latest
  #   volumes:
  #     - /var/run/docker.sock:/var/run/docker.sock
  #   environment:
  #     - WATCHTOWER_CLEANUP=true
  #     - WATCHTOWER_POLL_INTERVAL=86400
  #   restart: always

volumes:
  feed-icons:

networks:
  public_access: # Provide the access for ttrss UI
  service_only: # Provide the communication network between services only
    internal: true
  database_only: # Provide the communication between ttrss and database only
    internal: true

appotry avatar Apr 12 '23 16:04 appotry