scrapix icon indicating copy to clipboard operation
scrapix copied to clipboard

`user_agents` in configuration file doesn't change HTTP User-Agent header

Open TonyRL opened this issue 1 year ago • 0 comments

Steps to reproduce

  1. Run the latest meilisearch image from docker.
  2. Place a reverse proxy, says Caddy, before the meilisearch container with the configuration:
:7701 {
    reverse_proxy localhost:7700
    log {
        output stdout
    }
}
  1. Update scrapix configuration file misc/config_examples/docusaurus-docsearch.json of meilisearch/scrapix to include a custom user-agent. "user_agents": ["foo"]
  2. Run yarn playground:docsearch from meilisearch/scrapix.
  3. Observe the log output of Caddy.
  4. Observe the log output of meilisearch. docker logs -f meilisearch

Expected behavior

  1. The HTTP "User-Agent" header from Caddy's log should be something similar to what the docs mentioned:

user_agents An array of user agents that are append at the end of the current user agents. In this case, if your user_agents value is ['My Thing (vx.x.x)'] the final user_agent becomes

Meilisearch JS (vx.x.x); Meilisearch Crawler (vx.x.x); My Thing (vx.x.x)
  1. The HTTP "User-Agent" header from meilisearch's log should be something similar to the above mentioned value.

Actual behavior

Caddy's log returns node as HTTP User-Agent.

INFO    http.log.access.log0    handled request {"request": {"remote_ip": "10.0.5.2", "remote_port": "33130", "client_ip": "10.0.5.2", "proto": "HTTP/1.1", "method": "POST", "host": "10.0.5.2:7701", "uri": "/indexes", "headers": {"Accept-Language": ["*"], "Sec-Fetch-Mode": ["cors"], "User-Agent": ["node"], "Accept-Encoding": ["gzip, deflate"], "Authorization": [], "Content-Type": ["application/json"], "X-Meilisearch-Client": ["Meilisearch Crawler (v0.1.7) ; foo ; Meilisearch JavaScript (v0.31.1)"], "Accept": ["*/*"], "Content-Length": ["51"], ...}}

meilisearch's log returns node as HTTP User-Agent.

INFO  actix_web::middleware::logger] 172.17.0.1 "PATCH /indexes/docusaurus-docsearch_crawler_tmp/settings HTTP/1.1" 202 140 "-" "node" 0.001615

TonyRL avatar Feb 15 '24 21:02 TonyRL