drogon Host Redirector plugin

#1783

Needs path redirection

I'm aware the pull request includes a small edit to SlashRemover, I believe it doesn't need its own PR for a small change.

Sep 25 '23 08:09 Mis1eader-dev

We can add a new group to the redirector plugin, and rename the groups to something easy to remember:

PreRedirectHandler
PathRedirectHandler
PostRedirectHandler
PathForwardHandler

This HostRedirector will register itself within the 3rd group, to receive the path after clean up from SlashRemover, which sits within group 2.

Sep 26 '23 06:09 Mis1eader-dev

I noticed the formatter doesn't move & ampersands next to the identifiers.

const string& host = ...;

Will not get formatted to:

const string &host = ...;

Is it perhaps a bug, or did formatting rules get changed to allow both styles?

Sep 26 '23 08:09 Mis1eader-dev

We can add a new group to the redirector plugin, and rename the groups to something easy to remember:

PreRedirectHandler

PathRedirectHandler

PostRedirectHandler

PathForwardHandler

This HostRedirector will register itself within the 3rd group, to receive the path after clean up from SlashRemover, which sits within group 2.

This seems too complex for users.

Sep 26 '23 08:09 an-tao

I noticed the formatter doesn't move & ampersands next to the identifiers.
const string& host = ...;
Will not get formatted to:
const string &host = ...;
Is it perhaps a bug, or did formatting rules get changed to allow both styles?

Both styles seem to be present in the current code, you can try the configuration of clang-format and choose the second style.

Sep 26 '23 09:09 an-tao

We can add a new group to the redirector plugin, and rename the groups to something easy to remember:

PreRedirectHandler

PathRedirectHandler

PostRedirectHandler

PathForwardHandler

This HostRedirector will register itself within the 3rd group, to receive the path after clean up from SlashRemover, which sits within group 2.

This seems too complex for users.

Yeah, and we don't have another choice. The only way to not introduce another group is to integrate this plugin into Redirector itself.

I have added the 3rd group and renamed the handlers for better clarity, should I revert it?

We can think of it the same as AOP callbacks, we can say they also are complex in a way without reading the documentation and seeing the picture visualization.

Sep 26 '23 09:09 Mis1eader-dev

We can add a new group to the redirector plugin, and rename the groups to something easy to remember:

PreRedirectHandler

PathRedirectHandler

PostRedirectHandler

PathForwardHandler

This HostRedirector will register itself within the 3rd group, to receive the path after clean up from SlashRemover, which sits within group 2.

This seems too complex for users.

I also think that it is kind of over-engineered for most use cases and makes the API harder to understand for new users. I’m in favor of keeping things simple.

Sep 27 '23 07:09 rbugajewski

This is a rough idea on implementing it using multiple nested unordered maps, it is complex ~~and I don't plan on implementing it that way~~, it generates a map with common characters split into nested nodes. ~~In case anybody is interested in changing the implementation in the future~~. The data structure for this is called a Trie.

It will be easy to use:

// Some nodes need to store information like how many folders are common within that node,
// for example "/common/places" has a common depth of 2, once the path substring lookup
// gets to "/common/places/open", it will substring with a depth of 2 and looks up
// ["/common/places"] within the unordered map of that node, then substring further up until "/open".
const auto &ruleTo = rulesFrom["www.example.com"]["/maps"]["/common/places"]["/open"];

~~A vector of strings may be more desirable and easier to implement for this plugin. I will go for this route.~~

Regarding introducing a new group for Redirector, we can discuss it once this plugin's implementation is finished.

Edit: Using vectors is a hassle, there still is a need to know common points within the rules, using unordered maps is the way to go.

Oct 01 '23 07:10 Mis1eader-dev

This is still a draft

The shown diagram redirects in the following fashion: 10.10.10.1 -> files.example.com 10.10.10.1/any -> www.example.com/any

10.10.10.1/files -> files.example.com (Doesn't have a strict match rule, so it redirects even if there is no subpath) 10.10.10.1/files/video -> files.example.com/video 10.10.10.1/files/video/asia.mp4 -> files.example.com/video/asia.mp4

10.10.10.1/files/map -> www.example.com/maps 10.10.10.1/files/map/asia -> www.example.com/maps/asia 10.10.10.1/files/maps -> www.example.com/maps 10.10.10.1/files/maps/asia -> Does not redirect, the rule is strict, not wildcard

10.10.10.1/files/docs -> files.example.com/docs 10.10.10.1/files/docs/drogon.docx -> files.example.com/docs/drogon.docx

10.10.10.1/files/img -> files.example.com/imgs 10.10.10.1/files/img/asia.png -> Does not redirect, the rule is strict, not wildcard 10.10.10.1/files/imgs -> files.example.com/imgs 10.10.10.1/files/imgs/asia.png -> Does not redirect, the rule is strict, not wildcard

The code of said draft

#include <cstddef>
#include <cstdint>
#include <iostream>
#include <string>
#include <unordered_map>
#include <vector>

using std::cout;
using std::string;

struct RedirectTo
{
	string host, path;
};

struct RedirectPath
{
	std::unordered_map<string, RedirectPath*> paths;
	size_t minPathLen = SIZE_MAX;
	size_t minPathDepth = SIZE_MAX;
	RedirectTo* wildcard;
	RedirectTo* to;
};

static void recurse(const RedirectPath* parent, size_t tabs = 1)
{
	if(!parent)
		return;

	const size_t moreTab = tabs + 1;
	for(const auto& [childPathStr, childPathObj] : parent->paths)
	{
		bool newline = false;
		if(childPathObj->to)
		{
			for(size_t t = tabs; t; --t)
				cout << '\t';

			cout << childPathStr;

			cout << " -> " << childPathObj->to->host << childPathObj->to->path;

			newline = true;
		}

		if(childPathObj->wildcard)
		{
			if(newline)
				cout << '\n';

			for(size_t t = tabs; t; --t)
				cout << '\t';

			cout << childPathStr;

			cout << " (*)";
			cout << " -> " << childPathObj->wildcard->host << childPathObj->wildcard->path << "/*";

			newline = true;
		}

		if(newline)
			cout << '\n';
		else
		{
			for(size_t t = tabs; t; --t)
				cout << '\t';
			cout << childPathStr << '\n';
		}

		recurse(childPathObj, moreTab);
	}
}

int main()
{
	std::unordered_map<string, RedirectPath> rulesFrom;
	std::vector<RedirectTo> rulesTo;
	rulesTo.push_back({"www.example.com"});
	rulesTo.push_back({"files.example.com"});
	rulesTo.push_back({"files.example.com", "/imgs"});
	rulesTo.push_back({"files.example.com", "/docs"});
	rulesTo.push_back({"www.example.com", "/maps"});

	{
		auto rp0 = rulesFrom["10.10.10.1"].paths["/"] = new RedirectPath;
		rp0->wildcard = &rulesTo[0];
		rp0->to = &rulesTo[1];
	}

	{
		auto rp0 = rulesFrom["10.10.10.1"].paths["/files"] = new RedirectPath;
		rp0->wildcard = &rulesTo[1];
		{
			{
				auto rp1 = rp0->paths["/img"] = new RedirectPath;
				rp1->wildcard = &rulesTo[2];
				{
					auto rp2 = rp1->paths[""] = new RedirectPath;
					rp2->to = &rulesTo[2];
				}
				{
					auto rp2 = rp1->paths["s"] = new RedirectPath;
					rp2->to = &rulesTo[2];
				}
			}
			{
				auto rp1 = rp0->paths["/map"] = new RedirectPath;
				{
					auto rp2 = rp1->paths[""] = new RedirectPath;
					rp2->wildcard = &rulesTo[4];
				}
				{
					auto rp2 = rp1->paths["s"] = new RedirectPath;
					rp2->to = &rulesTo[4];
				}
			}
		}
		{
			auto rp1 = rp0->paths["/docs"] = new RedirectPath;
			rp1->wildcard = &rulesTo[3];
		}
	}

	for(const auto& [host, redirects] : rulesFrom)
	{
		cout << host << ":\n";
		recurse(&redirects);
	}
}

The final implementation should do some checks to determine common points and make the data structure compact, and cache some information for quick execution during runtime after the initial mapping from startup.

If anybody has a better idea, feel free to share.

Edit: Change of algorithm, the maps can be populated using inequality boundaries and length exhaustion.

// 10.10.10.1/user/file
// 10.10.10.1/user/files/*
// 10.10.10.1/user/files/audio
// 10.10.10.1/user/files/video/*
//
// 10.10.10.1/global/*
//
// 10.10.10.1/user/crocodile

rulesFrom = {
	"10.10.10.1": {
		maxPathLen = 6,
		"/user/": {
			maxPathLen = 4,
			"file": {
				to = ...,
				maxPathLen = 1,
				"s": {
					wildcard = ...,
					maxPathLen = 6,
					"/audio": {
						to = ...,
						maxPathLen = 0
					},
					"/video": {
						wildcard = ...,
						maxPathLen = 0
					}
				}
			},
			"croc": {
				maxPathLen = 5,
				"odile": {
					to = ...,
					maxPathLen = 0
				}
			}
		},
		"/globa": {
			maxPathLen = 1,
			"l": {
				wildcard = ...,
				maxPathLen = 0
			}
		}
	}
}

Oct 02 '23 08:10 Mis1eader-dev

Through experimenting, I came up with ~~two~~ five methods of populating the nodes

1. Nearest slash separator

This splits out nodes based on the closest slash in a specific window.

Example

10.10.10.1/user/files
10.10.10.1/user/apps
10.10.10.1/usr/lib
10.10.10.1/usr/include

rulesFrom = {
    "10.10.10.1": {
        maxPathLen = 5,
        "/user": {
            maxPathLen = 5,
            "/file": {
                maxPathLen = 1,
                "s": {
                    to = ...,
                    maxPathLen = 0
                }
            },
            "/apps": {
                to = ...,
                maxPathLen = 0
            }
        },
        "/usr/": {
            maxPathLen = 3,
            "lib": {
                to = ...,
                maxPathLen = 0
            },
            "inc": {
                maxPathLen = 4,
                "lude": {
                    to = ...,
                    maxPathLen = 0
                }
            }
        }
    }
}

Pros:

Less tree depth, saves some memory.
Less CPU usage, from doing less map lookups and hash calculations.
Less overall memory usage due to lesser tree depths, meaning less maps.

Cons:

Slightly more memory usage for data, as it may store duplicate prefixes.
Complex to implement.

2. Common letter separator

Splits out nodes based on intersecting letters in a specific window.

Example

10.10.10.1/user/files
10.10.10.1/user/apps
10.10.10.1/usr/lib
10.10.10.1/usr/include

rulesFrom = {
    "10.10.10.1": {
        maxPathLen = 2,
        "/us": {
            maxPathLen = 2,
            "er": {
                maxPathLen = 1,
                "/": {
                    maxPathLen = 4,
                    "file": {
                        maxPathLen = 1,
                        "s": {
                            to = ...,
                            maxPathLen = 0
                        }
                    },
                    "apps": {
                        to = ...,
                        maxPathLen = 0
                    }
                }
            },
            "r/": {
                maxPathLen = 3,
                "lib": {
                    to = ...,
                    maxPathLen = 0
                },
                "inc": {
                    maxPathLen = 4,
                    "lude": {
                        to = ...,
                        maxPathLen = 0
                    }
                }
            }
        }
    }
}

Pros:

Slightly less memory usage for data, it doesn't store any duplicates.

Cons:

More tree depth, in cases where there are too many similar looking paths.
More CPU usage, from doing deep nested map lookups. (Due to increase in tree depth)
More overall memory usage due to increased tree depth.
Complex to implement.

~~I choose the first implementation. I listed both in case others favor the 2nd implementation.~~

Edit:

3. Local length boundary separator

Splits out nodes on local length boundaries.

Example

10.10.10.1/user/files
10.10.10.1/user/apps
10.10.10.1/usr/lib
10.10.10.1/usr/include

rulesFrom = {
    "10.10.10.1": {
        maxPathLen = 8,
        "/user/fi": {
            maxPathLen = 3,
            "les": {
                to = ...,
                maxPathLen = 0
            }
        },
        "/user/ap": {
            maxPathLen = 2,
            "ps": {
                to = ...,
                maxPathLen = 0
            }
        }
        "/usr/lib": {
            to = ...,
            maxPathLen = 0
        },
        "/usr/inc": {
            maxPathLen = 4,
            "lude": {
                to = ...,
                maxPathLen = 0
            }
        }
    }
}

Pros:

Lesser tree depth than 1st method.
Lesser CPU usage, from doing less map lookups and hash calculations.
Lesser overall memory usage due to lesser tree depths, meaning less maps.

Cons:

More memory usage for data, it can store multiple duplicate prefixes.
Less complex to implement. However, still requires recursive population.

4. Global length boundary separator

Splits out nodes on length boundaries across all rules.

Example

10.10.10.1/user/files
10.10.10.1/user/apps
10.10.10.1/usr/lib
10.10.10.1/usr/include

rulesFrom = {
    "10.10.10.1": {
        maxPathLen = 8,
        "/user/fi": {
            maxPathLen = 2,
            "le": {
                maxPathLen = 1,
                "s": {
                    to = ...,
                    maxPathLen = 0
                }
            }
        },
        "/user/ap": {
            maxPathLen = 2,
            "ps": {
                to = ...,
                maxPathLen = 0
            }
        }
        "/usr/lib": {
            to = ...,
            maxPathLen = 0
        },
        "/usr/inc": {
            maxPathLen = 2,
            "lu": {
                maxPathLen = 1,
                "d": {
                    maxPathLen = 1,
                    "e": {
                        to = ...,
                        maxPathLen = 0
                    }
                }
            }
        }
    }
}

Pros:

Easy to implement.

Cons:

More tree depth, in cases where there are gradual length differences (staircase).
Slightly more CPU usage, from doing deep nested map lookups. (Due to increase in tree depth)
More overall memory usage due to increased tree depth.

5. Sorted vector and binary search

Inserts each strict rule into an unordered map with path length as key, and has an unordered map within each one with the path as key, and its to pointer as value. Inserts each wildcard rule into a vector in a sorted manner, and during runtime it performs binary search for length, and checks those below the request path length to see which one matches.

Example

10.10.10.1/user/files
10.10.10.1/user/apps
10.10.10.1/usr/lib
10.10.10.1/usr/maps
10.10.10.1/usr/include
10.10.10.1/usr/include/*
10.10.10.1/usr/bin/*

rulesFrom = {
    "10.10.10.1": {
        8: {
            "/usr/lib": ...
        },
        10: {
            "/user/apps": ...,
            "/user/maps": ...
        },
        11: {
            "/user/files": ...
        },
        12: {
            "/usr/include": ...
        }
    ]
}
rulesFromWildcard = {
    "10.10.10.1": [
        {
            "/usr/bin", (8)
            wildcard = ...
        },
        {
            "/usr/include", (12)
            wildcard = ...
        }
    ]
}

Pros:

Easiest to implement.

Cons:

More memory usage for data, it stores multiple duplicate prefixes.
Two roundtrips in the worst case scenario, as it first looks in the strict rules map, if it didn't find the path, then it looks within the wildcard map. If the request path is longer than every wildcard rule's path, it ends up going through all wildcard rules to check against the request path.

Uncertainties:

CPU usage can't be determined, the cost is the same as performing binary search on a sorted string container. Can't be compared to previous implementations, since they use unordered maps primarily.

I'm thinking of method 1, 4, or 5. Will update on which one I decide to implement.

Update: Decided on method 3, it seems the most efficient in terms of CPU usage, memory, and runtime speed.

Oct 03 '23 08:10 Mis1eader-dev

The implementation of the third method is complete. See the standalone source code and let me know if there is a need for any changes before implementing it within this drogon plugin.

Oct 07 '23 13:10 Mis1eader-dev

TODO: Add support for dynamic addition and removal of rules. This final feature will allow this drogon plugin to create applications like Cloudflare Pages.

We can start discussing where this plugin ends up in the drogon ecosystem.

Nov 27 '23 08:11 Mis1eader-dev

Currently the plugin has support for normal redirects and mandated splat, if we go off based on Cloudflare Pages redirects https://developers.cloudflare.com/pages/platform/redirects/#advanced-redirects

I will add another TODO: Allow splatting to be optional, by having an asterisk added to the To rules, meaning: /home/* -> /, will discard everything after /home and redirect to / always /home/* -> /*, will flatten everything after /home and redirect to /{flattened}

I would argue this can replace drogon's controller routers if it uses for loops or regexes to check for matches one by one, however, we have to add support for additional things before doing so, as regex controllers must be kept, and normal controllers can rely on this trie implementation to quickly route requests, and potentially redirect if its configuration becomes part of the drogon's "app" config.

Dec 10 '23 07:12 Mis1eader-dev

drogon drogon copied to clipboard

Host Redirector plugin

1. Nearest slash separator

Example

Pros:

Cons:

2. Common letter separator

Example

Pros:

Cons:

3. Local length boundary separator

Example

Pros:

Cons:

4. Global length boundary separator

Example

Pros:

Cons:

5. Sorted vector and binary search

Example

Pros:

Cons:

Uncertainties:

drogon
drogon copied to clipboard