conifer
conifer copied to clipboard
Can't capture anything
Fresh docker install. Registered a test user, got email, confirmed my account. Now, no matter what URL I specify, I get this error:
There's been an error No such page or content is not accessible.
I tried "docker logs -f xyz" for each of the containers, nothing useful. I have no info to even start digging...
Sorry about that, we were in the middle of updating the base image for the new pywb release. Just merged a PR that should fix things. Can you trying again with the latest and rebuilding?
git pull Updating bcf772f..7e11a8b Fast-forward webrecorder/Dockerfile | 4 ++-- webrecorder/webrecorder/init.py | 2 +- webrecorder/webrecorder/admincontroller.py | 6 +++++- webrecorder/webrecorder/rec/storage/base.py | 3 +++ webrecorder/webrecorder/rec/storage/local.py | 17 +++++++++++------ webrecorder/webrecorder/rec/storagecommitter.py | 5 +++++ webrecorder/webrecorder/rec/tempchecker.py | 14 ++++++++++++++ webrecorder/webrecorder/utils.py | 6 +++--- 8 files changed, 44 insertions(+), 13 deletions(-)
Ran recreate.sh
Still same error
Do I need to wipe and start from 0?
That should have pulled a new image and updated everything...
There is nothing in any of these logs?
docker-compose logs recorder
docker-compose logs warcserver
docker-compose logs app
Can you paste the contents of these, or send? What is the host OS you're running on?
I have Ubuntu VM under ProxMox. Here are the logs:
uname -a Linux home 4.4.0-142-generic #168-Ubuntu SMP Wed Jan 16 21:00:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
docker-compose logs recorder Attaching to webrecorder_recorder_1 recorder_1 | [uWSGI] getting INI configuration from /code/apps/rec.ini recorder_1 | *** Starting uWSGI 2.0.18 (64bit) on [Fri Mar 1 02:08:39 2019] *** recorder_1 | compiled with version: 6.3.0 20170516 on 28 February 2019 17:09:58 recorder_1 | os: Linux-4.4.0-142-generic #168-Ubuntu SMP Wed Jan 16 21:00:45 UTC 2019 recorder_1 | nodename: 43ec6575510d recorder_1 | machine: x86_64 recorder_1 | clock source: unix recorder_1 | pcre jit disabled recorder_1 | detected number of CPU cores: 16 recorder_1 | current working directory: /code recorder_1 | detected binary path: /usr/local/bin/uwsgi recorder_1 | your memory page size is 4096 bytes recorder_1 | detected max file descriptor number: 1024 recorder_1 | - async cores set to 1000 - fd table size: 1024 recorder_1 | lock engine: pthread robust mutexes recorder_1 | thunder lock: disabled (you can enable it with --thunder-lock) recorder_1 | uwsgi socket 0 bound to TCP address :8010 fd 3 recorder_1 | Python version: 3.7.2 (default, Feb 6 2019, 12:04:03) [GCC 6.3.0 20170516] recorder_1 | Python main interpreter initialized at 0x5611ab722b40 recorder_1 | python threads support enabled recorder_1 | your server socket listen backlog is limited to 100 connections recorder_1 | your mercy for graceful operations on workers is 60 seconds recorder_1 | mapped 359787320 bytes (351354 KB) for 4000 cores recorder_1 | *** Operational MODE: preforking+async *** recorder_1 | No Archives Loaded recorder_1 | WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0x5611ab722b40 pid: 21 (default app) recorder_1 | *** uWSGI is running in multiple interpreter mode *** recorder_1 | spawned uWSGI master process (pid: 21) recorder_1 | spawned uWSGI worker 1 (pid: 25, cores: 1000) recorder_1 | spawned uWSGI worker 2 (pid: 26, cores: 1000) recorder_1 | spawned uWSGI worker 3 (pid: 27, cores: 1000) recorder_1 | spawned uWSGI worker 4 (pid: 28, cores: 1000) recorder_1 | spawned uWSGI mule 1 (pid: 29) recorder_1 | spawned uWSGI mule 2 (pid: 30) recorder_1 | *** running gevent loop engine [addr:0x5611aacfafd0] *** recorder_1 | wr.io: 2019-03-01 02:08:40: [INFO]: Worker: Running StorageCommitter every 30 recorder_1 | wr.io: 2019-03-01 02:08:40: [INFO]: Worker: Running TempChecker every 30 recorder_1 | wr.io: 2019-03-01 02:08:40: [INFO]: Recorder pubsub: Waiting for messages recorder_1 | wr.io: 2019-03-01 02:08:40: [INFO]: Recorder pubsub: Waiting for messages recorder_1 | wr.io: 2019-03-01 02:08:40: [INFO]: Recorder pubsub: Waiting for messages recorder_1 | wr.io: 2019-03-01 02:08:40: [INFO]: Recorder pubsub: Waiting for messages recorder_1 | wr.io: 2019-03-01 02:08:40: [INFO]: Storage Committer Started recorder_1 | wr.io: 2019-03-01 02:08:40: [INFO]: Storage Root: /data/storage/ recorder_1 | wr.io: 2019-03-01 02:08:40: [INFO]: Temp Check Root: /data/warcs/ recorder_1 | wr.io: 2019-03-01 02:08:40: [DEBUG]: TempChecker: Temp Users to Remove: 0 recorder_1 | wr.io: 2019-03-01 02:09:10: [DEBUG]: TempChecker: Temp Users to Remove: 0 recorder_1 | wr.io: 2019-03-01 02:09:40: [DEBUG]: TempChecker: Temp Users to Remove: 0 recorder_1 | wr.io: 2019-03-01 02:10:11: [DEBUG]: TempChecker: Temp Users to Remove: 0 recorder_1 | wr.io: 2019-03-01 02:10:41: [DEBUG]: TempChecker: Temp Users to Remove: 0 recorder_1 | wr.io: 2019-03-01 02:11:11: [DEBUG]: TempChecker: Temp Users to Remove: 0 recorder_1 | wr.io: 2019-03-01 02:11:41: [DEBUG]: TempChecker: Temp Users to Remove: 0
docker-compose logs warcserver Attaching to webrecorder_warcserver_1 warcserver_1 | [uWSGI] getting INI configuration from /code/apps/load.ini warcserver_1 | *** Starting uWSGI 2.0.18 (64bit) on [Fri Mar 1 02:08:39 2019] *** warcserver_1 | compiled with version: 6.3.0 20170516 on 28 February 2019 17:09:58 warcserver_1 | os: Linux-4.4.0-142-generic #168-Ubuntu SMP Wed Jan 16 21:00:45 UTC 2019 warcserver_1 | nodename: 67dedae570ef warcserver_1 | machine: x86_64 warcserver_1 | clock source: unix warcserver_1 | pcre jit disabled warcserver_1 | detected number of CPU cores: 16 warcserver_1 | current working directory: /code warcserver_1 | detected binary path: /usr/local/bin/uwsgi warcserver_1 | your memory page size is 4096 bytes warcserver_1 | detected max file descriptor number: 1024 warcserver_1 | - async cores set to 400 - fd table size: 1024 warcserver_1 | lock engine: pthread robust mutexes warcserver_1 | thunder lock: disabled (you can enable it with --thunder-lock) warcserver_1 | uwsgi socket 0 bound to TCP address :8080 fd 3 warcserver_1 | Python version: 3.7.2 (default, Feb 6 2019, 12:04:03) [GCC 6.3.0 20170516] warcserver_1 | Python main interpreter initialized at 0x55c346ac3a10 warcserver_1 | python threads support enabled warcserver_1 | your server socket listen backlog is limited to 100 connections warcserver_1 | your mercy for graceful operations on workers is 60 seconds warcserver_1 | mapped 317025104 bytes (309594 KB) for 4000 cores warcserver_1 | *** Operational MODE: preforking+async *** warcserver_1 | No Archives Loaded warcserver_1 | WSGI app 0 (mountpoint='') ready in 2 seconds on interpreter 0x55c346ac3a10 pid: 21 (default app) warcserver_1 | *** uWSGI is running in multiple interpreter mode *** warcserver_1 | spawned uWSGI master process (pid: 21) warcserver_1 | spawned uWSGI worker 1 (pid: 25, cores: 400) warcserver_1 | spawned uWSGI worker 2 (pid: 26, cores: 400) warcserver_1 | spawned uWSGI worker 3 (pid: 27, cores: 400) warcserver_1 | *** running gevent loop engine [addr:0x55c345cc3fd0] *** warcserver_1 | spawned uWSGI worker 4 (pid: 28, cores: 400) warcserver_1 | spawned uWSGI worker 5 (pid: 29, cores: 400) warcserver_1 | spawned uWSGI worker 6 (pid: 30, cores: 400) warcserver_1 | spawned uWSGI worker 7 (pid: 31, cores: 400) warcserver_1 | spawned uWSGI worker 8 (pid: 32, cores: 400) warcserver_1 | spawned uWSGI worker 9 (pid: 33, cores: 400) warcserver_1 | spawned uWSGI worker 10 (pid: 34, cores: 400)
docker-compose logs app Attaching to webrecorder_app_1 app_1 | [uWSGI] getting INI configuration from /code/apps/apiapp.ini app_1 | [uwsgi-static] added mapping for http://webrecorder.proxy/static => /code/webrecorder/static app_1 | [uwsgi-static] added mapping for /static => /code/webrecorder/static app_1 | *** Starting uWSGI 2.0.18 (64bit) on [Fri Mar 1 02:08:40 2019] *** app_1 | compiled with version: 6.3.0 20170516 on 28 February 2019 17:09:58 app_1 | os: Linux-4.4.0-142-generic #168-Ubuntu SMP Wed Jan 16 21:00:45 UTC 2019 app_1 | nodename: 9537097ef449 app_1 | machine: x86_64 app_1 | clock source: unix app_1 | pcre jit disabled app_1 | detected number of CPU cores: 16 app_1 | current working directory: /code app_1 | detected binary path: /usr/local/bin/uwsgi app_1 | your memory page size is 4096 bytes app_1 | detected max file descriptor number: 1024 app_1 | building mime-types dictionary from file /etc/mime.types...554 entry found app_1 | - async cores set to 400 - fd table size: 1024 app_1 | lock engine: pthread robust mutexes app_1 | thunder lock: disabled (you can enable it with --thunder-lock) app_1 | uwsgi socket 0 bound to TCP address :8081 fd 3 app_1 | uwsgi socket 1 bound to TCP address :8088 fd 4 app_1 | Python version: 3.7.2 (default, Feb 6 2019, 12:04:03) [GCC 6.3.0 20170516] app_1 | Python main interpreter initialized at 0x55cb2410ad50 app_1 | python threads support enabled app_1 | your server socket listen backlog is limited to 100 connections app_1 | your mercy for graceful operations on workers is 60 seconds app_1 | mapped 317025104 bytes (309594 KB) for 4000 cores app_1 | *** Operational MODE: preforking+async *** app_1 | /usr/local/lib/python3.7/site-packages/yaml/constructor.py:126: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working app_1 | if not isinstance(key, collections.Hashable): app_1 | No Archives Loaded app_1 | WSGI app 0 (mountpoint='') ready in 2 seconds on interpreter 0x55cb2410ad50 pid: 21 (default app) app_1 | *** uWSGI is running in multiple interpreter mode *** app_1 | spawned uWSGI master process (pid: 21) app_1 | spawned uWSGI worker 1 (pid: 26, cores: 400) app_1 | spawned uWSGI worker 2 (pid: 27, cores: 400) app_1 | spawned uWSGI worker 3 (pid: 28, cores: 400) app_1 | *** running gevent loop engine [addr:0x55cb22049fd0] *** app_1 | spawned uWSGI worker 4 (pid: 29, cores: 400) app_1 | spawned uWSGI worker 5 (pid: 30, cores: 400) app_1 | spawned uWSGI worker 6 (pid: 31, cores: 400) app_1 | spawned uWSGI worker 7 (pid: 33, cores: 400) app_1 | spawned uWSGI worker 8 (pid: 35, cores: 400) app_1 | spawned uWSGI worker 9 (pid: 39, cores: 400) app_1 | spawned uWSGI worker 10 (pid: 41, cores: 400) app_1 | /usr/local/lib/python3.7/site-packages/pkg_resources/init.py:1710: DeprecationWarning: Use of .. or absolute path in a resource path is not allowed and will raise exceptions in a future release. app_1 | zip_path = self._resource_to_zip(resource_name) app_1 | [pid: 26|app: 0|req: 1/1] 192.168.1.42 () {62 vars in 1014 bytes} [Fri Mar 1 02:09:02 2019] GET /api/v1/auth/curr_user => generated 348 bytes in 19 msecs (HTTP/1.1 200) 3 headers in 208 bytes (3 switches on core 399) app_1 | [pid: 39|app: 0|req: 1/2] 172.21.0.9 () {40 vars in 632 bytes} [Fri Mar 1 02:10:36 2019] GET /api/v1/auth/curr_user?include_colls=true => generated 892 bytes in 27 msecs (HTTP/1.1 200) 2 headers in 72 bytes (3 switches on core 399) app_1 | [pid: 41|app: 0|req: 1/3] 192.168.1.42 () {62 vars in 994 bytes} [Fri Mar 1 02:10:36 2019] GET /api/v1/client_archives => generated 2 bytes in 9 msecs (HTTP/1.1 200) 2 headers in 70 bytes (3 switches on core 399) app_1 | [pid: 26|app: 0|req: 2/4] 192.168.1.42 () {66 vars in 1061 bytes} [Fri Mar 1 02:10:48 2019] POST /api/v1/new => generated 198 bytes in 8 msecs (HTTP/1.1 200) 2 headers in 72 bytes (3 switches on core 399) app_1 | [pid: 39|app: 0|req: 2/5] 192.168.1.42 () {60 vars in 963 bytes} [Fri Mar 1 02:10:52 2019] GET /api/v1/client_archives => generated 2 bytes in 3 msecs (HTTP/1.1 200) 2 headers in 70 bytes (3 switches on core 399) app_1 | [pid: 41|app: 0|req: 2/6] 192.168.1.42 () {62 vars in 1014 bytes} [Fri Mar 1 02:11:45 2019] GET /api/v1/auth/curr_user => generated 348 bytes in 8 msecs (HTTP/1.1 200) 2 headers in 72 bytes (3 switches on core 399) app_1 | [pid: 41|app: 0|req: 3/7] 172.21.0.9 () {40 vars in 632 bytes} [Fri Mar 1 02:11:46 2019] GET /api/v1/auth/curr_user?include_colls=true => generated 892 bytes in 6 msecs (HTTP/1.1 200) 2 headers in 72 bytes (3 switches on core 399) app_1 | [pid: 26|app: 0|req: 3/8] 192.168.1.42 () {62 vars in 994 bytes} [Fri Mar 1 02:11:47 2019] GET /api/v1/client_archives => generated 2 bytes in 3 msecs (HTTP/1.1 200) 2 headers in 70 bytes (3 switches on core 399) app_1 | [pid: 35|app: 0|req: 1/9] 192.168.1.42 () {66 vars in 1061 bytes} [Fri Mar 1 02:12:03 2019] POST /api/v1/new => generated 222 bytes in 29 msecs (HTTP/1.1 200) 2 headers in 72 bytes (3 switches on core 399) app_1 | [pid: 26|app: 0|req: 4/10] 192.168.1.42 () {62 vars in 1155 bytes} [Fri Mar 1 02:15:46 2019] GET /api/v1/auth/curr_user => generated 348 bytes in 5 msecs (HTTP/1.1 200) 2 headers in 72 bytes (3 switches on core 399) app_1 | [pid: 41|app: 0|req: 4/11] 192.168.1.42 () {60 vars in 1064 bytes} [Fri Mar 1 02:15:48 2019] GET /api/v1/collections?user=Alex&include_recordings=false&include_lists=false => generated 544 bytes in 6 msecs (HTTP/1.1 200) 2 headers in 72 bytes (3 switches on core 399) app_1 | [pid: 41|app: 0|req: 5/12] 192.168.1.42 () {60 vars in 963 bytes} [Fri Mar 1 02:15:48 2019] GET /api/v1/client_archives => generated 2 bytes in 3 msecs (HTTP/1.1 200) 2 headers in 70 bytes (3 switches on core 399)
That all looks good so far, but no capture traffic yet in the logs. What's the url you get redirected to after entering the url, say http://example.com/ Are you running locally off localhost or off of a different host?
Those logs were after several failed capture attempts. I run it behind Traefik proxy. I just tried again, connecting to the host directly (http://cloud:8089) - same exact error. And no new useful log data.
Ah, that's would be the source of the issue..
I realize we should update the docs to make this more clear and also have better error messaging.
The default settings are only for running over http://localhost:8089
with the content loaded from http://localhost:8092
(for security reasons to isolate the app from the captured content)
To have a different host/port, you need to set APP_HOST and CONTENT_HOST in wr.env, and then run recreate.sh Eg:
APP_HOST=cloud:8089
CONTENT_HOST=cloud:8092
You can use different ports/domains, since you're using Traefik as long as they route to localhost:8089 and localhost:8092
Wow, it works now! Captured some pages, but with the current browser only. If I pick any other browser from the list, it goes into capturing mode, 0 bytes and nothing stored...
EDIT: It worked only as cloud:8089. Through Traefik it doesn't work. What hosts should I use in the env file?
EDIT2: Tried this below, and set up two Traefik front/back ends to ports 8089 and 8092. WR showed my collection and the page list, but not pages themselves. Browse broken. APP_HOST=wr.mydomain.com CONTENT_HOST=wrcontent.mydomain.com
I then added :443 to both URLs above, and WR was completely broken. What hosts should I use?
Is running WR behind Traefik a supported config?
We haven't had a chance to test WR behind Traefik yet, but something we probably can support. We've generally been testing with nginx as the reverse proxy but no reason Traefik wouldn't work.
To use HTTPS, you should set SCHEME=https
env instead and not add the port.
The APP_HOST and CONTENT_HOST are designed to match the Host
header sent by the browser, which would not include :443.
We do have a Ansible playbook which sets up nginx with a reverse proxy. There's a few additional settings for nginx to get everything going (https://github.com/webrecorder/webrecorder-deploy/blob/master/templates/wrhost.conf) but I think Trafik should just work.. I'll try to test when I get a chance. For now, nginx config should generally work well.
For the remote browsers, there's additional containers that need to be pulled
You'll probably need to pull the following container:
docker pull oldwebtoday/vnc-webrtc-audio
and then for each browser you'd like, you can also pull any or all of the following:
docker pull oldwebtoday/chrome:67
docker pull oldwebtoday/chrome:60
docker pull oldwebtoday/chrome:53
docker pull oldwebtoday/firefox:57
docker pull oldwebtoday/firefox:56
docker pull oldwebtoday/firefox:49
(Yes, we need add a script to do all of that automatically!)
Somehow I forgot that we do in fact have the remote browser script already, and have just updated it with the latest available images and additional remote desktop system containers: https://github.com/webrecorder/webrecorder/blob/master/install-browsers.sh
To touch on the APP_HOST and CONTENT_HOST requirments, can I set an IP in these? I've attempted to do so and clicking "Capture" does nothing in the web interface or logs.
Maybe try 0.0.0.0?