pywb icon indicating copy to clipboard operation
pywb copied to clipboard

Small contributing guide

Open hanoii opened this issue 6 years ago • 6 comments

I wanted to attempt to see why #432 was happening.

I am still new to it but I wonder if a small introduction to how this is structure and where to look can help.

Also I tried at least debugging the app, both with PyCharm and Visual Code and neither seem to be able to understand the breakpoing. I put it on handle_call() on recorderapp.py and started the debugger with cli.py.

Threading seems to be affecting something.

Just seeing if I can get some quick start pointers quickstarting me up. THanks.

hanoii avatar Jan 08 '19 21:01 hanoii

@hanoii to make pycharm work with debugging pywb you will need to enable gevent compatibility (see image below) pycharmdebugginggeventcompat2

I would also setup a pycharm run configuration for this and set the script path to be <some path>/pywb/pywb/apps/cli.py with the working directory set to <some path>/pywb (this is my setup).

RE structure: the entry point of the application when using the cli with the --record flag is pywb.apps.frontendapp.FrontEndApp

Have you configured your collection for recording? https://pywb.readthedocs.io/en/latest/manual/configuring.html#recording-mode

N0taN3rd avatar Jan 08 '19 22:01 N0taN3rd

@N0taN3rd excellent, thanks!, that was the only thing missing.

For the structure part I kind of meant a short intro the the main business logic parts, like key functions or places where debugging could showcase the main functionality.

From the overall approach I take that the tool should be site agnostic, correct? However, on trying webrecorder with facebook (probably one of the most complex site) it didn't work properly, not on webrecorder and not on pywb, and webrecorder has a support ticket saying that some sites are trickier to record, so it means there's some understanding of the site?

Any pointers into what might be happening with #432 as far as where to look on the code?

And also, I wonder if you can share some thoughts on webrecorder/webrecorder#665 in order to also attempt navigating webrecorder. Is the recording part just pywb or there's more to it on webrecorder?

Thanks a lot!

hanoii avatar Jan 09 '19 13:01 hanoii

Have you configured your collection for recording? https://pywb.readthedocs.io/en/latest/manual/configuring.html#recording-mode

I configured the debug configuration with --record --live -a --auto-interval 10 for recording as per the getting started docs.

hanoii avatar Jan 09 '19 13:01 hanoii

RE facebook: can you (have you) open an issue here or on webrecorders ticket tracker with the details of what went wrong?
On of the things we are constantly fighting with is how fast facebook changes things internally (cookies, JS,ect) with cookies being one of the biggest issues. Webrecorder provides the more intricate details of recording, e.g. cookies, whereas pywb provides serialization to WARC etc portions of recording.

I believe @ikreymer would be able to provide you with a more detailed explanation for both #432 and webrecorder/webrecorder#665.

N0taN3rd avatar Jan 09 '19 22:01 N0taN3rd

On of the things we are constantly fighting with is how fast facebook changes things internally (cookies, JS,ect) with cookies being one of the biggest issues.

Oh, so pywb won't handle cookies properly?

Waiting then for @ikreymer to chip in webrecorder/webrecorder#665.

hanoii avatar Jan 10 '19 01:01 hanoii

It's more that if Facebook relies on a non-standard cookie that contains a URL that goes unrewitten that would cause an issue.

Pywb will handle cookies correctly, see pywb/rewrite/cookie_rewritter.py

N0taN3rd avatar Jan 10 '19 02:01 N0taN3rd