scrapix
scrapix copied to clipboard
`user_agents` in configuration file doesn't change HTTP User-Agent header
Steps to reproduce
- Run the latest meilisearch image from docker.
- Place a reverse proxy, says Caddy, before the meilisearch container with the configuration:
:7701 {
reverse_proxy localhost:7700
log {
output stdout
}
}
- Update scrapix configuration file
misc/config_examples/docusaurus-docsearch.json
of meilisearch/scrapix to include a custom user-agent."user_agents": ["foo"]
- Run
yarn playground:docsearch
from meilisearch/scrapix. - Observe the log output of Caddy.
- Observe the log output of meilisearch.
docker logs -f meilisearch
Expected behavior
- The HTTP
"User-Agent"
header from Caddy's log should be something similar to what the docs mentioned:
user_agents
An array of user agents that are append at the end of the current user agents. In this case, if youruser_agents
value is['My Thing (vx.x.x)']
the finaluser_agent
becomesMeilisearch JS (vx.x.x); Meilisearch Crawler (vx.x.x); My Thing (vx.x.x)
- The HTTP
"User-Agent"
header from meilisearch's log should be something similar to the above mentioned value.
Actual behavior
Caddy's log returns node
as HTTP User-Agent.
INFO http.log.access.log0 handled request {"request": {"remote_ip": "10.0.5.2", "remote_port": "33130", "client_ip": "10.0.5.2", "proto": "HTTP/1.1", "method": "POST", "host": "10.0.5.2:7701", "uri": "/indexes", "headers": {"Accept-Language": ["*"], "Sec-Fetch-Mode": ["cors"], "User-Agent": ["node"], "Accept-Encoding": ["gzip, deflate"], "Authorization": [], "Content-Type": ["application/json"], "X-Meilisearch-Client": ["Meilisearch Crawler (v0.1.7) ; foo ; Meilisearch JavaScript (v0.31.1)"], "Accept": ["*/*"], "Content-Length": ["51"], ...}}
meilisearch's log returns node
as HTTP User-Agent.
INFO actix_web::middleware::logger] 172.17.0.1 "PATCH /indexes/docusaurus-docsearch_crawler_tmp/settings HTTP/1.1" 202 140 "-" "node" 0.001615