drogon
drogon copied to clipboard
Host Redirector plugin
#1783
Needs path redirection
I'm aware the pull request includes a small edit to SlashRemover, I believe it doesn't need its own PR for a small change.
We can add a new group to the redirector plugin, and rename the groups to something easy to remember:
- PreRedirectHandler
- PathRedirectHandler
- PostRedirectHandler
- PathForwardHandler
This HostRedirector will register itself within the 3rd group, to receive the path after clean up from SlashRemover, which sits within group 2.
I noticed the formatter doesn't move & ampersands next to the identifiers.
const string& host = ...;
Will not get formatted to:
const string &host = ...;
Is it perhaps a bug, or did formatting rules get changed to allow both styles?
We can add a new group to the redirector plugin, and rename the groups to something easy to remember:
- PreRedirectHandler
- PathRedirectHandler
- PostRedirectHandler
- PathForwardHandler
This HostRedirector will register itself within the 3rd group, to receive the path after clean up from SlashRemover, which sits within group 2.
This seems too complex for users.
I noticed the formatter doesn't move
&ersands next to the identifiers.const string& host = ...;Will not get formatted to:
const string &host = ...;Is it perhaps a bug, or did formatting rules get changed to allow both styles?
Both styles seem to be present in the current code, you can try the configuration of clang-format and choose the second style.
We can add a new group to the redirector plugin, and rename the groups to something easy to remember:
- PreRedirectHandler
- PathRedirectHandler
- PostRedirectHandler
- PathForwardHandler
This HostRedirector will register itself within the 3rd group, to receive the path after clean up from SlashRemover, which sits within group 2.
This seems too complex for users.
Yeah, and we don't have another choice. The only way to not introduce another group is to integrate this plugin into Redirector itself.
I have added the 3rd group and renamed the handlers for better clarity, should I revert it?
We can think of it the same as AOP callbacks, we can say they also are complex in a way without reading the documentation and seeing the picture visualization.
We can add a new group to the redirector plugin, and rename the groups to something easy to remember:
- PreRedirectHandler
- PathRedirectHandler
- PostRedirectHandler
- PathForwardHandler
This HostRedirector will register itself within the 3rd group, to receive the path after clean up from SlashRemover, which sits within group 2.
This seems too complex for users.
I also think that it is kind of over-engineered for most use cases and makes the API harder to understand for new users. I’m in favor of keeping things simple.
This is a rough idea on implementing it using multiple nested unordered maps, it is complex ~~and I don't plan on implementing it that way~~, it generates a map with common characters split into nested nodes. ~~In case anybody is interested in changing the implementation in the future~~. The data structure for this is called a Trie.
It will be easy to use:
// Some nodes need to store information like how many folders are common within that node,
// for example "/common/places" has a common depth of 2, once the path substring lookup
// gets to "/common/places/open", it will substring with a depth of 2 and looks up
// ["/common/places"] within the unordered map of that node, then substring further up until "/open".
const auto &ruleTo = rulesFrom["www.example.com"]["/maps"]["/common/places"]["/open"];
~~A vector of strings may be more desirable and easier to implement for this plugin. I will go for this route.~~
Regarding introducing a new group for Redirector, we can discuss it once this plugin's implementation is finished.
Edit: Using vectors is a hassle, there still is a need to know common points within the rules, using unordered maps is the way to go.
This is still a draft
The shown diagram redirects in the following fashion:
10.10.10.1 -> files.example.com
10.10.10.1/any -> www.example.com/any
10.10.10.1/files -> files.example.com (Doesn't have a strict match rule, so it redirects even if there is no subpath)
10.10.10.1/files/video -> files.example.com/video
10.10.10.1/files/video/asia.mp4 -> files.example.com/video/asia.mp4
10.10.10.1/files/map -> www.example.com/maps
10.10.10.1/files/map/asia -> www.example.com/maps/asia
10.10.10.1/files/maps -> www.example.com/maps
10.10.10.1/files/maps/asia -> Does not redirect, the rule is strict, not wildcard
10.10.10.1/files/docs -> files.example.com/docs
10.10.10.1/files/docs/drogon.docx -> files.example.com/docs/drogon.docx
10.10.10.1/files/img -> files.example.com/imgs
10.10.10.1/files/img/asia.png -> Does not redirect, the rule is strict, not wildcard
10.10.10.1/files/imgs -> files.example.com/imgs
10.10.10.1/files/imgs/asia.png -> Does not redirect, the rule is strict, not wildcard
The code of said draft
#include <cstddef>
#include <cstdint>
#include <iostream>
#include <string>
#include <unordered_map>
#include <vector>
using std::cout;
using std::string;
struct RedirectTo
{
string host, path;
};
struct RedirectPath
{
std::unordered_map<string, RedirectPath*> paths;
size_t minPathLen = SIZE_MAX;
size_t minPathDepth = SIZE_MAX;
RedirectTo* wildcard;
RedirectTo* to;
};
static void recurse(const RedirectPath* parent, size_t tabs = 1)
{
if(!parent)
return;
const size_t moreTab = tabs + 1;
for(const auto& [childPathStr, childPathObj] : parent->paths)
{
bool newline = false;
if(childPathObj->to)
{
for(size_t t = tabs; t; --t)
cout << '\t';
cout << childPathStr;
cout << " -> " << childPathObj->to->host << childPathObj->to->path;
newline = true;
}
if(childPathObj->wildcard)
{
if(newline)
cout << '\n';
for(size_t t = tabs; t; --t)
cout << '\t';
cout << childPathStr;
cout << " (*)";
cout << " -> " << childPathObj->wildcard->host << childPathObj->wildcard->path << "/*";
newline = true;
}
if(newline)
cout << '\n';
else
{
for(size_t t = tabs; t; --t)
cout << '\t';
cout << childPathStr << '\n';
}
recurse(childPathObj, moreTab);
}
}
int main()
{
std::unordered_map<string, RedirectPath> rulesFrom;
std::vector<RedirectTo> rulesTo;
rulesTo.push_back({"www.example.com"});
rulesTo.push_back({"files.example.com"});
rulesTo.push_back({"files.example.com", "/imgs"});
rulesTo.push_back({"files.example.com", "/docs"});
rulesTo.push_back({"www.example.com", "/maps"});
{
auto rp0 = rulesFrom["10.10.10.1"].paths["/"] = new RedirectPath;
rp0->wildcard = &rulesTo[0];
rp0->to = &rulesTo[1];
}
{
auto rp0 = rulesFrom["10.10.10.1"].paths["/files"] = new RedirectPath;
rp0->wildcard = &rulesTo[1];
{
{
auto rp1 = rp0->paths["/img"] = new RedirectPath;
rp1->wildcard = &rulesTo[2];
{
auto rp2 = rp1->paths[""] = new RedirectPath;
rp2->to = &rulesTo[2];
}
{
auto rp2 = rp1->paths["s"] = new RedirectPath;
rp2->to = &rulesTo[2];
}
}
{
auto rp1 = rp0->paths["/map"] = new RedirectPath;
{
auto rp2 = rp1->paths[""] = new RedirectPath;
rp2->wildcard = &rulesTo[4];
}
{
auto rp2 = rp1->paths["s"] = new RedirectPath;
rp2->to = &rulesTo[4];
}
}
}
{
auto rp1 = rp0->paths["/docs"] = new RedirectPath;
rp1->wildcard = &rulesTo[3];
}
}
for(const auto& [host, redirects] : rulesFrom)
{
cout << host << ":\n";
recurse(&redirects);
}
}
The final implementation should do some checks to determine common points and make the data structure compact, and cache some information for quick execution during runtime after the initial mapping from startup.
If anybody has a better idea, feel free to share.
Edit: Change of algorithm, the maps can be populated using inequality boundaries and length exhaustion.
// 10.10.10.1/user/file
// 10.10.10.1/user/files/*
// 10.10.10.1/user/files/audio
// 10.10.10.1/user/files/video/*
//
// 10.10.10.1/global/*
//
// 10.10.10.1/user/crocodile
rulesFrom = {
"10.10.10.1": {
maxPathLen = 6,
"/user/": {
maxPathLen = 4,
"file": {
to = ...,
maxPathLen = 1,
"s": {
wildcard = ...,
maxPathLen = 6,
"/audio": {
to = ...,
maxPathLen = 0
},
"/video": {
wildcard = ...,
maxPathLen = 0
}
}
},
"croc": {
maxPathLen = 5,
"odile": {
to = ...,
maxPathLen = 0
}
}
},
"/globa": {
maxPathLen = 1,
"l": {
wildcard = ...,
maxPathLen = 0
}
}
}
}
Through experimenting, I came up with ~~two~~ five methods of populating the nodes
1. Nearest slash separator
This splits out nodes based on the closest slash in a specific window.
Example
10.10.10.1/user/files
10.10.10.1/user/apps
10.10.10.1/usr/lib
10.10.10.1/usr/include
rulesFrom = {
"10.10.10.1": {
maxPathLen = 5,
"/user": {
maxPathLen = 5,
"/file": {
maxPathLen = 1,
"s": {
to = ...,
maxPathLen = 0
}
},
"/apps": {
to = ...,
maxPathLen = 0
}
},
"/usr/": {
maxPathLen = 3,
"lib": {
to = ...,
maxPathLen = 0
},
"inc": {
maxPathLen = 4,
"lude": {
to = ...,
maxPathLen = 0
}
}
}
}
}
Pros:
- Less tree depth, saves some memory.
- Less CPU usage, from doing less map lookups and hash calculations.
- Less overall memory usage due to lesser tree depths, meaning less maps.
Cons:
- Slightly more memory usage for data, as it may store duplicate prefixes.
- Complex to implement.
2. Common letter separator
Splits out nodes based on intersecting letters in a specific window.
Example
10.10.10.1/user/files
10.10.10.1/user/apps
10.10.10.1/usr/lib
10.10.10.1/usr/include
rulesFrom = {
"10.10.10.1": {
maxPathLen = 2,
"/us": {
maxPathLen = 2,
"er": {
maxPathLen = 1,
"/": {
maxPathLen = 4,
"file": {
maxPathLen = 1,
"s": {
to = ...,
maxPathLen = 0
}
},
"apps": {
to = ...,
maxPathLen = 0
}
}
},
"r/": {
maxPathLen = 3,
"lib": {
to = ...,
maxPathLen = 0
},
"inc": {
maxPathLen = 4,
"lude": {
to = ...,
maxPathLen = 0
}
}
}
}
}
}
Pros:
- Slightly less memory usage for data, it doesn't store any duplicates.
Cons:
- More tree depth, in cases where there are too many similar looking paths.
- More CPU usage, from doing deep nested map lookups. (Due to increase in tree depth)
- More overall memory usage due to increased tree depth.
- Complex to implement.
~~I choose the first implementation. I listed both in case others favor the 2nd implementation.~~
Edit:
3. Local length boundary separator
Splits out nodes on local length boundaries.
Example
10.10.10.1/user/files
10.10.10.1/user/apps
10.10.10.1/usr/lib
10.10.10.1/usr/include
rulesFrom = {
"10.10.10.1": {
maxPathLen = 8,
"/user/fi": {
maxPathLen = 3,
"les": {
to = ...,
maxPathLen = 0
}
},
"/user/ap": {
maxPathLen = 2,
"ps": {
to = ...,
maxPathLen = 0
}
}
"/usr/lib": {
to = ...,
maxPathLen = 0
},
"/usr/inc": {
maxPathLen = 4,
"lude": {
to = ...,
maxPathLen = 0
}
}
}
}
Pros:
- Lesser tree depth than 1st method.
- Lesser CPU usage, from doing less map lookups and hash calculations.
- Lesser overall memory usage due to lesser tree depths, meaning less maps.
Cons:
- More memory usage for data, it can store multiple duplicate prefixes.
- Less complex to implement. However, still requires recursive population.
4. Global length boundary separator
Splits out nodes on length boundaries across all rules.
Example
10.10.10.1/user/files
10.10.10.1/user/apps
10.10.10.1/usr/lib
10.10.10.1/usr/include
rulesFrom = {
"10.10.10.1": {
maxPathLen = 8,
"/user/fi": {
maxPathLen = 2,
"le": {
maxPathLen = 1,
"s": {
to = ...,
maxPathLen = 0
}
}
},
"/user/ap": {
maxPathLen = 2,
"ps": {
to = ...,
maxPathLen = 0
}
}
"/usr/lib": {
to = ...,
maxPathLen = 0
},
"/usr/inc": {
maxPathLen = 2,
"lu": {
maxPathLen = 1,
"d": {
maxPathLen = 1,
"e": {
to = ...,
maxPathLen = 0
}
}
}
}
}
}
Pros:
- Easy to implement.
Cons:
- More tree depth, in cases where there are gradual length differences (staircase).
- Slightly more CPU usage, from doing deep nested map lookups. (Due to increase in tree depth)
- More overall memory usage due to increased tree depth.
5. Sorted vector and binary search
Inserts each strict rule into an unordered map with path length as key, and has an unordered map within each one with the path as key, and its to pointer as value.
Inserts each wildcard rule into a vector in a sorted manner, and during runtime it performs binary search for length, and checks those below the request path length to see which one matches.
Example
10.10.10.1/user/files
10.10.10.1/user/apps
10.10.10.1/usr/lib
10.10.10.1/usr/maps
10.10.10.1/usr/include
10.10.10.1/usr/include/*
10.10.10.1/usr/bin/*
rulesFrom = {
"10.10.10.1": {
8: {
"/usr/lib": ...
},
10: {
"/user/apps": ...,
"/user/maps": ...
},
11: {
"/user/files": ...
},
12: {
"/usr/include": ...
}
]
}
rulesFromWildcard = {
"10.10.10.1": [
{
"/usr/bin", (8)
wildcard = ...
},
{
"/usr/include", (12)
wildcard = ...
}
]
}
Pros:
- Easiest to implement.
Cons:
- More memory usage for data, it stores multiple duplicate prefixes.
- Two roundtrips in the worst case scenario, as it first looks in the strict rules map, if it didn't find the path, then it looks within the wildcard map. If the request path is longer than every wildcard rule's path, it ends up going through all wildcard rules to check against the request path.
Uncertainties:
- CPU usage can't be determined, the cost is the same as performing binary search on a sorted string container. Can't be compared to previous implementations, since they use unordered maps primarily.
I'm thinking of method 1, 4, or 5. Will update on which one I decide to implement.
Update: Decided on method 3, it seems the most efficient in terms of CPU usage, memory, and runtime speed.
The implementation of the third method is complete. See the standalone source code and let me know if there is a need for any changes before implementing it within this drogon plugin.
TODO: Add support for dynamic addition and removal of rules. This final feature will allow this drogon plugin to create applications like Cloudflare Pages.
We can start discussing where this plugin ends up in the drogon ecosystem.
Currently the plugin has support for normal redirects and mandated splat, if we go off based on Cloudflare Pages redirects https://developers.cloudflare.com/pages/platform/redirects/#advanced-redirects
I will add another TODO: Allow splatting to be optional, by having an asterisk added to the To rules, meaning:
/home/* -> /, will discard everything after /home and redirect to / always
/home/* -> /*, will flatten everything after /home and redirect to /{flattened}
I would argue this can replace drogon's controller routers if it uses for loops or regexes to check for matches one by one, however, we have to add support for additional things before doing so, as regex controllers must be kept, and normal controllers can rely on this trie implementation to quickly route requests, and potentially redirect if its configuration becomes part of the drogon's "app" config.