pouet0.9 icon indicating copy to clipboard operation
pouet0.9 copied to clipboard

Kill it with fire.

Open kebby opened this issue 12 years ago • 20 comments

The pouet codebase sucks. From every single PHP file to the SQL schema it's all a mess, clearly started by somebody who had no too big clue back then (which is ok per se) but all chances at refactoring even simple things like duplicate functions etc. have been missed.

This stuff here has no structure at all because from its first inception on everything has just been copy+pasted, bolted and gaffertaped to the rest, and if we continue to develop it this way we'll never get to a point where adding a major feature would be actually possible.

Every minute that goes into this codebase (except the fixes necessary now that all the vunerabilities are out in the open) prevents the necessary rewrite from happening. There's at least two people that have proven that it's no rocket science to build a pouet like site that actually works and is maintainable: Gargaj with the 2.0 beta and Chaos with http://www.farbrausch.com (though that one's a monolithic python file that has its own slew of problems).

So I request STOPPING all development on here, then designing a SANE database schema and accompaniying API, and THEN building a new pouet on top. It's not that hard, really.

kebby avatar May 08 '13 12:05 kebby

i completely agree. the REST API concept might be the best way forward in that direction me thinks.

psenough avatar May 08 '13 12:05 psenough

I’m pretty sure that @kebby is not talking about things like REST but more about an API layer between the database and the application. Bolting a REST interface on top of that is trivial and should not be a priority.

Bombe avatar May 08 '13 13:05 Bombe

i'm just saying, if we build it as service on top of the current database, folks can do their own interfaces out of it and then we can remodel the db at will as long as the rest api is still serving it. i'm no expert in site db remake though, wouldn't hurt listening from folks with more experience in that department.

psenough avatar May 08 '13 13:05 psenough

Exactly, you want something to get (more or less) independent of the database. REST is not it.

Bombe avatar May 08 '13 14:05 Bombe

So I request STOPPING all development on here, then designing a SANE database schema and accompaniying API, and THEN building a new pouet on top. It's not that hard, really.

I wholeheartedly agree to this.

javiercampos avatar May 08 '13 14:05 javiercampos

No REST, please. Let's take a simple hypothetical example: Retrieve and edit a prod.

Ok, let's get the prod... GET http://pouet.net/api/prods/1337 and you get a prod record as JSON - sounds good, doesn't it?

But... what do you get exactly? How are 1:many relationships like groups, links, comments and especially related prods expressed in the JSON? Just as an id/URI? Yeah great, now you'll have to GET each and every group, comment, link, platform, party, prodtype and related prod with one additional request each. Say goodbyte to any resemblence of performance. Or, shall we get all comments at one? like GET /api/comments?prodid=1337 ? Yeah, point is, this already violates REST principles because an URI must be a 1:1 mapping to a resource. Sorry. Or let's inline everything. Great. Now, pouet's data structures are pretty complex actually (what comprises a party record? shouldn't be a release list in there?) and might also be cyclic, so we need to stop inlining somewhere. Where? Well, that'll end up to be one of the hidden properties of the API that users will need to figure out as they go along. Oh, and it actually violates REST again because there's no chance to track changes on dependent properties.

Ok, now let's pretend we found a solution for that. Let's edit a prod and PUT /api/prods/1337 it to the API. Ok, what exactly do we send? Only the fields that have changed? Nope, violates REST as resources must always be the whole thing for easier caching (proxies are expected to cache a page on PUT so that they can return it on GET). Damn. Can't we get around this by eg. PUT /api/prods/1337/name when the name has changed? No, violation again, you can't mix the collection and resource paradigms that way, also caches again. So, let's send the whole thing. With, uh, what inlining levels again? Should be the original ones from the GET because of the reasons above, shouldn't they? But what if we have actually changed them? What if someone else has changed them via a different route? What if somebody changes the prod type while we're changing the prod name?

REST has no answer for ANY of these questions. If your data structures are actually "yeah, one resource per URI and only some links to others in it but not too many" it works but as soon as stuff gets nontrivial it breaks instantly. So, i strongly advise against it, even if that means creating an ocean of hipster tears.

kebby avatar May 08 '13 21:05 kebby

... and btw, the above rant is the result of my last year when I had to use and create a handful of web services in close collaboration with other teams. Ask anyone who jumped on the REST bandwagon and tried to pull it through.

kebby avatar May 08 '13 21:05 kebby

so what do you suggest? a custom made api? in what format?

psenough avatar May 09 '13 02:05 psenough

Very good question, and not exactly easy to answer. This is the point where I'd spend more than a whole work day just plotting stuff; for obvious reasons I'll just type down what I thought of in the 15 minutes of shower I just had :) Also, disclaimer: My view on things comes from a few years experience in writing servers for online games that need to sustain thousands of concurrent users at once; but I reckon it should also work for the way lower load that Pouet has to suffer. :)

I'd first read the current source and extract what features (or use cases, or User Stories, or whatever's the buzzword du jour) the frontend needs, and then model an RPC API close, but not too close to these requirements , eg. a GetProd(id, flags) function that collects all (or some, depending on the flags) data relevant to one prod.

This RPC mechanism should be standard XML-RPC or JSON-RPC (or preferably both), and the two most important things are that the API is correctly versioned and stuff doesn't break (eg. POST to pouet.net/api/v1 etc, and as soon as there's a major/breaking change, use v2 and still let the v1 stuff work), and first of all that all associated data structures are defined from a correct schema. Whether there's a master .xsd file somewhere in the source or it's generated from the source code of the API itself is irrelevant, the point is that all clients need an authoritative reference for all data structures. Because if you just made up stuff while writing the RPC calls and documented it somewhere, client code will inevitably look as messy, handwritten and copypasted as 0.9 does. Also, there's code generators for about every language that can convert XML schemas into data structures, and even dynamically for most scripting languages. Using JSON with XML schemas works, too, only that you need extra kludges to keep polymorphisms etc. running because JSON without extending it a bit is just not expressive enough to handle any types that you don't know beforehand.

Now, which transport mechanism to use? HTTP would be the obvious choice and can easily be done but I'd like to pitch the advantages of eg. an actual TCP server for the API here. We can say goodbye to "Page created in 0.01 seconds" anyway as soon as the code is more structured; but a stateful API (instead of a stateless one) has big advantages, namely first that caching is incredibly easy to do. Think the front page - a stateful API can simply raise a flag as soon as a new prod is added, and the handler for GetLatestReleasedProds instantly knows it needs to regenerate its result structure once and then just cache it in memory for all subsequent requests.

This is why node.js is all the rage now - even if its programing model again results in a horrible mess of code as soon as you try to scale it up, it easily offsets all the idiocy of having a stateless model in PHP. Speaking of node.js, next design choice would be how the servers scheduling should work; there's single-thread select vs thread-per-connection vs. asynchronous. I'd strongly suggest the latter two and weigh in that asynchronous performs best but the only language that can handle asynchronous without getting extremely ugly code currently is C# with its async/await keywords. All others result in horrible chains of callbacks and lambdas, see node.js. But multithreaded is ok, too, and it's way easier to do.

As said, the above concepts have successfully been tested by hitting one server machine with 5000 bot clients at once (lots of db/other_webservice retrieval, some updates) and the server didn't really break a sweat.

kebby avatar May 09 '13 19:05 kebby

didnt know about json rpc as a standard, i have been using non standard json in similar fashion on some of my web projects though. reference link for json rpc: http://www.jsonrpc.org/specification

i guess the best gradual migration scenario right now is:

  1. go through all .php's and document use cases as function calls.
  2. create a json_api.php with those php functions that take arguments and return the json.
  3. refactor the current user.php and prod.php etc to use them, parsing the json result into the currently used php arrays.
  4. test if all documented use cases are working ok locally, only then push it live.
  5. write a json_rpc_api.php with functions that parse a json rpc input call, detect the right call and wrap it to the corresponding function in json_api.php (or port the calls to nodejs/whatever)
  6. replace all user.php and prod.php again and have them call the json_rpc_api directly?
  7. refactor v2 to use json_rpc_api calls

i'm thinking that maybe we don't really need steps 5 and 6, unless we plan to change server structure in some way? the refactoring of any new version could just as easily be made to access the functions from json_api.php directly, no?

psenough avatar May 09 '13 22:05 psenough

The thing about the API is twofold:

  1. It's public. Everyone can use the API from outside (given proper credentials), and the pouet website is just one of many clients (albeit the reference one :). Also, if you only call one php file from another there's no real need for JSON. :) But as an intermediate step it's ok.
  2. To again reiterate my "the API should be a stateful server" point - the API can take over all the necessary caching to keep load from the server and the db without clients having to implement their own scheme. You can't do that with php, really. People needed to invent stuff like memcached and 1000 other schemes just because they were too lazy not to use php.

In the end the API and the web page should be two separate "products", and from that point on also all data will be truly open (no sql dump needed anymore).

kebby avatar May 09 '13 22:05 kebby

ok, i'll get started on step 1) and 2) on a separate branch when i find some free time.

psenough avatar May 09 '13 22:05 psenough

Speaking of; is there a development machine and database somewhere? Developing a new API on the live db sounds dangerous.

kebby avatar May 09 '13 22:05 kebby

@kebby Not really, you have to inject the pouet.sql file into your MySQL and create some data. Making it easier is a priority for me, like having a script to generate fake data. https://github.com/lra/pouet.net/issues/42 I'm open to suggestions.

lra avatar May 09 '13 22:05 lra

until a dev machine is up, i'm planning to develop locally and only submit when it's tested.or just comment on the pull request that it isnt tested properly when it's just small fixes.

a sanitized snapshot of the data would be nice to have to populate locally but i think analogue or gargaj will do that soon.

psenough avatar May 09 '13 23:05 psenough

Kebby's plan is great

uucidl avatar May 11 '13 07:05 uucidl

finally found some time to start working on this https://github.com/psenough/pouet.net/tree/kebby_with_fire in case anyone wants to help out

psenough avatar May 23 '13 16:05 psenough

Not sure how to handle the /api/ .php's. They don't appear to be in use for anything on 1.0, maybe they are on use on v2.0? Or maybe they were just created to serve syndication to other sites. Either way, it would be better to remake them following the jsonp concept. Will leave them for later. Anyone knows anything about them?

psenough avatar May 26 '13 18:05 psenough

swapped emails with gargaj about this, he says it's waste of time, so i'm stopping dev on my branch.

issue should probably be altered to 'wont fix'.

psenough avatar Jun 05 '13 14:06 psenough

I concur. We shouldn't forget the scope of what pouet is, and it just doesn't make sense to handle it as if it were the next Amazon.

kebby avatar Jun 05 '13 14:06 kebby