data-hub icon indicating copy to clipboard operation
data-hub copied to clipboard

UUID and you may will own the world :)

Open hlohaus opened this issue 3 years ago • 0 comments

The most important practical use of UUIDs maybe will be, that the import of data do no longer depends on the results of the database. This makes it possible to build up a complete article structure without having to wait again and again for a result. The biggest mistake you can maybe make with UUIDs is, that you read or save them individually or at all from the database. Because then you probably don't know one of the biggest problem with APIs on the Internet. There is always someone waiting for someone and a lot of space and resources are wasted accordingly. The quickest way to improve the access time and performance of a website / shop is to reduce the number of database queries.

Example: A practical function for shop owners is the generation of a dynamic SEO URL. So that this does not contain an additional ID and now even a UUID, only the URL is the key for the resolution of the page. Which is not a problem in itself, since a page is only ever called up with one URL. Rather, the problem is the creation of the links on the page.

In Symfony the links are generated from templates and IDs, which can be stored well in the container as a template, but nobody really wants internal IDs in their URL.

So it might be better to read a ready-made URL from the database for your ID. But should i do it hundreds of times in a shop with a large menu and only a few items? Just because I do something similar hundreds of times, it doesn't make it any more efficient. To solve this huge performance problem, only placeholders are initially stored in the HTML result. In Shopware 5 these start with "shopware.php" and in Shopware 6 exitis a UUID for this. Instead of reading it individually for each link, the placeholders with the IDs are read from the HTML result and the shop only made one query for the links of the entire page. A "WHERE IN" query is used for this, since "OR" causes a similar problem. (to OR query) As a programmer, you know how to get better performance through the targeted use of "OR". Thanks to "WHERE IN", a quick database result with all the links can be write back into the finished page, using a regular expression. And of course, if you were just a PIM system and always had the IDs of the result, you don't need to go the detour via HTML.

There are concepts that have already given up waiting, e.g. NodeJS through async calls, but PHP / MySQL is currently not one of them. For example, async calls do not help with PimCore either. In PHP the standard HttpClient (Guzzle) implements async requests. But if you want to win something with it, you have to let do the PIM send images from the hard drive. Then there a 5.6 MiB/s instead of just 3.7 MiB/s in the data transfer. With normal database queries via Datahub, it will be with my tools only 30 KiB/s. Even if the importer starts the next query, before he processes the current result and even he loads German and English at the same time. The problem here is not the UUIDs, because if I were to use it from the PIM, it would only make it even slower. The problem here is probably the PIM does not know the important "IN" filter and apparently waits and makes others wait. And the shop, who should actually do something, cleverly lets do others the work. (He only give them the best tools and foward the requests politely.) So he may can enjoy the beautiful view and make money.

Summary:

  • Any database query that is not necessary is bad.
  • The concatenation of OR filters with always the same field in SQL should not even be possible. - Add a IN filter simply fix the main issue: https://github.com/pimcore/data-hub/issues/223 https://github.com/pimcore/data-hub/issues/189

This is maybe a core thing: - Relational databases should not be UUID validators or producers. The UUID should come from the user and stored on the object table with an index to be useable as identifier. The current implementation should be become a big warning and be removed sone as possible.

To me:

Our current customer has only 200 products. It doesn't really matter to me today. I want to help you / my colleagues. Also the company, were are i in it, are a gold partner. They want to use it in future for big customers too. The system doesn't seem too old, to come back on the bright side. So far I've only had bad experiences with PIM systems, it usually felt like a long detour. And my colleagues may doesn't read the documentation about UUID (in shopware) and also shopware forget for example to use a "IN" query in the big url sitemap.

hlohaus avatar Nov 01 '20 03:11 hlohaus