Tor2web icon indicating copy to clipboard operation
Tor2web copied to clipboard

Redevelopment in C

Open cheako opened this issue 6 years ago • 22 comments

The more I read the source the more I cannot believe this wasn't written in C. Are there any design documents or block diagram lying around that would help in a rewrite?


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

cheako avatar Aug 10 '17 23:08 cheako

Here are some ideas that may make the source more readable.

  1. Split off the port 80 listener to it's own source.
  2. Perhaps this could all be a lighttpd module, assuming we don't need to accept/fork as we do currently.

cheako avatar Aug 11 '17 16:08 cheako

Within Tor's environment the Python Twisted framework is widely used, that's one of the reason Tor2web is based on Twisted Python. Twisted Python additionally is high-performance and self-contained, without requiring any third party daemons, ensuring simplicity.

fpietrosanti avatar Aug 11 '17 16:08 fpietrosanti

If i would re-work Tor2web in C, i would suggest to make a proper modular patch directly to the Tor's core software that's written in C. That way Tor2web would be something that can be enabled/disabled from Tor core code, without installing another application. That would be a huge improvement.

fpietrosanti avatar Aug 11 '17 16:08 fpietrosanti

I'm not seeing what twisted has to offer this project, all it's doing is string manipulation. What twisted does not offer is enhancements to the http protocol. I've been looking at this for days and it's not clear to me how to add support for connection upgrade.

Smaller is better. The socks interface tor exposes is sufficient for the task.

cheako avatar Aug 11 '17 18:08 cheako

I wrote some Design Documentation.

cheako avatar Aug 12 '17 15:08 cheako

Do we see the need to specify chunked? According to wikipedia anytime using HTTP 1.1 this is an option and it is not necessary to specify. Nore does it seem appropriate to specify it this way.

I'll ask around if anyone knows an offending httpd.

cheako avatar Aug 13 '17 02:08 cheako

After some testing I've decided that chunked encoding is not important. It may not be supported by 1.0 clients, but we don't need to go out of our way to enable it.

cheako avatar Aug 13 '17 03:08 cheako

Consider that there's not a single clear way to reverse proxy many different conditions that are not under the control of the proxy (neither the HTML/CSS/JSS/Headers rewrite), neither the behaviour of the backend HTTP server.

Over 5-6 years we encountered many situations that got fixed iteratively, so please consider and exact 1-to-1 cloning also of the minor rewriting rules.

fpietrosanti avatar Aug 13 '17 07:08 fpietrosanti

I've had some success writing a test suite. Currently the tests are passing.

The suite consists of a home-made socks server that prints test results it's collected to a client that connects to the telnet port of host "exit". Other virtual hosts are to be implemented, typically the host names will look like valid tor names.

This should be sufficient to test all the behaviour of both servers and clients, even fictitious cases can be tested.

Where I'll need some help is getting Tor2web started, I've made some example configs and even an ssl certificate.

Here is an example script that starts a daemon and then talks to it, I'll use it as a template for Tor2web.

cheako avatar Aug 15 '17 07:08 cheako

Thank you for making this @cheako, actually having testing would really help the project.

When you implementation will be ready we could integrate it with Travis and have it all retested on-commit; currently on travis we just verify some code quality metrics and we test the packaging.

Would be nice instead to make the package install and re-test it on commit.

This will require probably applying to Tor2web the same edits we just made to GlobaLeaks in order to instantiate it and control it via txtorcon.

evilaliv3 avatar Aug 15 '17 13:08 evilaliv3

Take a look at this patch. There are a handful of TODO items, once these are done we can begin hacking tx/Dbasic.t and tx/Asocks5.t to actually have a conversation through Tor2web. "git am" can be used to apply this patch to a new branch for the testsuite and I can then pull that branch and continue working.

cheako avatar Aug 15 '17 18:08 cheako

I'm confused by this error msg: https://travis-ci.org/cheako/Tor2web-1/builds/265050936#L9414

Unhandled error in Deferred: Traceback (most recent call last): Failure: twisted.spread.pb.PBConnectionLost: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionDone'>: Connection was closed cleanly.]

strace log doesn't show a connection. It seems like this is in response to an SSL connection, but with no call to accept() it's as though it doesn't like the cert(loads it 20 or 30 times)

cheako avatar Aug 16 '17 08:08 cheako

I'm camping for the next week, so the pace of development on this will all but stop. This is a good chance to catch up on the changes.

cheako avatar Aug 16 '17 19:08 cheako

See: https://github.com/cheako/tor2web/wiki/Code-Explination

cheako avatar Sep 11 '17 22:09 cheako

This is coming along, there is currently more code than tests. I'm really excited about the progress on this.

cheako avatar Sep 13 '17 19:09 cheako

Worth sharing, 112kb of code and the ci proving it does something. Take a look and open issues or make comments.

  1. Code https://github.com/cheako/tor2web/tree/6b5c527ffb977995bd9d1e727ea9b86da30bf6f3/src
  2. Make Comments https://github.com/cheako/tor2web/commit/6b5c527ffb977995bd9d1e727ea9b86da30bf6f3
  3. See test output https://travis-ci.org/cheako/tor2web/builds/275289063

The big to do items:

  • Tie the http client portion to the socks client library, currently it's just a complex echo server.
  • Mangle hostnames, not sure how to do this for everything like cookies and URLs.

cheako avatar Sep 14 '17 02:09 cheako

I've come to the point where I need to mangle the hostnames and I was looking at the existing configuration and I couldn't understand it.

https://github.com/globaleaks/Tor2web/blob/9abd3e256f1f5948474383de4fa2b29665af4785/tor2web/t2w.py#L121

I like the idea of using a host map, but I can't see how this implementation works. Should I just implement this as an ordered list of regexes?

cheako avatar Sep 22 '17 19:09 cheako

I've reached a milestone, first successful processing of a request. https://travis-ci.org/cheako/tor2web/builds/279454523#L1346

cheako avatar Sep 25 '17 17:09 cheako

https://travis-ci.org/cheako/tor2web/builds/279786648#L1339

Now supports 3 different types of requests. It's starting to be critical that #339 is not working.

cheako avatar Sep 26 '17 03:09 cheako

Can convert from chunked encoding to content length. https://travis-ci.org/cheako/tor2web/builds/280591063#L1344

cheako avatar Sep 27 '17 20:09 cheako

I'm working on the static pages and I've opted not to implement the antanistaticmap folder. I feel it's better to serve out the content in a single request. I've implemented m4 preprocessing to handle including the javascript/css/png into all of the different documents.

Here is what the documents will look like, they could stand to be cleaned up. https://travis-ci.org/cheako/tor2web/builds/300678724#L1741

I ran into a few problems, for example the /antanistaticmap/notification url used in the javascript. https://github.com/cheako/tor2web/blob/7003152c327d08ba45fb2959d1cddbfc77df20e3/data/templates/tor2web.js#L36

I also need explained to me what the decoy.html is for, it uses "/antanistaticmap/dev/null" for some reason.

cheako avatar Nov 11 '17 18:11 cheako

I refactored the templates, moving common markup into an m4 script. This reduces the size and complexity of almost every other template file. Here is error_invalid_hostname.html.in

generate_page(`Tor2web Error: Invalid Hostname',  `tor2web-content', `dnl
        <h2 id="tor2web-title">Tor2web Error: Invalid Hostname</h2>
        <p>Sorry, we couldn''`t serve the page you requested.</p>
        <ol>
          <li><p><strong>The entered URL is invalid.</strong> This most likely happens if you get this page immediately after trying to visit a URL. This service only works with valid <code>.onion</code> URLs. Please check your URL and try again.</p></li>
        </ol>
')dnl

cheako avatar Nov 14 '17 20:11 cheako