magento-lts File IO for every category and product collection

Summary (*)

Mage_Catalog_Helper_Flat_Abstract::isAvailable() gets called from Mage_Catalog_Model_Resource_Product_Collection::isEnabledFlat() which is called from the constructor and from Mage_Catalog_Model_Category::_construct(). So every page where we load a category or a product collection, it calls that method.

    public function isAvailable()
    {
        if (is_null($this->_isAvailable)) {
            $this->_isAvailable = $this->isAccessible() && !$this->getProcess()->isLocked();
        }
        return $this->_isAvailable;
    }

It loads the indexer process from the database in $this->getProcess. Then it checks if the status value of that process is not running. Then it checks to see if the process has a lock in Mage_Index_Model_Process::isLocked(). Notice that it forces it to be a file as well.

    public function isLocked()
    {
        return $this->_getLockInstance()->isLockExists($this->getProcessLockName(), true);
    }

    protected function _isLockExistsFile($lockName)
    {
        $result = true;
        $fp = $this->_getLockFile($lockName);
        if (flock($fp, LOCK_EX | LOCK_NB)) {
            flock($fp, LOCK_UN);
            $result = false;
        }
        return $result;
    }

    protected function _getLockFile($lockName)
    {
        if (!isset(self::$_lockFileResource[$lockName]) || self::$_lockFileResource[$lockName] === null) {
            $varDir = Mage::getConfig()->getVarDir('locks');
            $file = $varDir . DS . $lockName . '.lock';
            if (is_file($file)) {
                self::$_lockFileResource[$lockName] = fopen($file, 'w');
            } else {
                self::$_lockFileResource[$lockName] = fopen($file, 'x');
            }
            fwrite(self::$_lockFileResource[$lockName], date('r'));
        }
        return self::$_lockFileResource[$lockName];
    }

I always wondered why I see these files in var/lock when I don't have any crons running. While I was checking some performance issues, I saw the checking the lock file took 8ms according to New Relic.

Examples (*)

Using the code from above, put a breakpoint in any of those places and go to a category or product page.

Proposed solution

There's a bunch of problems here.

the locks don't really serve a purpose in a multi-server environment. They are local to the PHP env. They would need to be done via the database. So, we could change the call to isLockExists() to have false as its second parameter. That would force it to use the database
maybe instead of doing all of this checking to see if the flat index is built and if it is in the process of building, we should have a config to say if it can be used right now. This would be different from if it is enabled. Enabling it enables its indexing and the possibility of it being used. This would see if it can be used right now. This setting would be turned off while it was indexing and turned back on when it was done. It would be off if it is disabled. It would mean we don't need to check if it has any rows (which is not a good way to check anyway)
do we need to disable the flat index while it is indexing? Seems like this has been this way forever and people really don't have problems. I've never run Magento in a single host environment and this hasn't caused issues. The fact that it doesn't block for multi-server environments means this is a bug. That bug doesn't seem to have any issues with the product collection or category. If it does, we should have a way to do atomic indexing. Create another table and point the application to the built index. Ultimately, the EAV system just doesn't suit ecom in 2020, but we won't go there.

Oct 01 '20 15:10 joshua-bn

Good conversation from Discord:

zstein Today at 12:16 PM
It seems like the least breaking way of doing this would be to simply transition the file lock to be a database lock table, which wouldn't solve all of the problems, but it would help in clustered environments, and it would reduce overhead because you're not hitting the file system for every request, so in theory the database could cache that query.
and there is the index_process table that could potentially be used that way
you could probably do something like process_state enum('running','completed')
I would suspect based on things I've seen in the wild that some people (improperly) directly reference catalog_product_flat_1, etc. So the table swapping is not without breaking risks
but if you're doing it that way, maybe you deserve what you get :man_shrugging:
(With you being the theoretical person directly querying _flat_x tables)

Josh Dickerson Today at 12:31 PM
That would fix the "bug" but the bug is what people expect now. The indexer is supposed to lock that file while indexing. While it is doing that, the flat tables are supposed to not be usable. That's appears to be true for single-host environments. That's not true for multi-host environments.
So, if you expect flat tables to be accessible while it is indexing them and you're on a multi-host environment, you just broke it.

zstein Today at 12:31 PM
Yeah, I understand that, I'm suggesting fixing the bug first as a non-breaking change

Josh Dickerson Today at 12:32 PM
I think it is "breaking" for those multi-host environments - probably most people.

zstein Today at 12:32 PM
I think you should expect that your multihost environment behaves like a singlehost for the most part

Josh Dickerson Today at 12:32 PM
but who knew that's how it should behave?

zstein Today at 12:32 PM
because most people won't run their test environments on multiple hosts
well, on your multihost environment, whichever host owns the indexing process would behave inconsistently with the other hosts
which means inconsistent results at page-load time

Josh Dickerson Today at 12:33 PM
I had no idea that it was meant to not use flat tables while indexing and I've been using Magento for years.

zstein Today at 12:33 PM
and could lead to weird stuff getting cached based on which hosts renders the page

Josh Dickerson Today at 12:33 PM
Maybe I shouldn't say BC change. It's just a massive unexpected change.
Definitely agree with you. It's bad to do.

zstein Today at 12:34 PM
so, the bug in and of itself leads to inconsistent behavior as-is depending on which node is doing the indexing...
which is pretty bad

Josh Dickerson Today at 12:34 PM
I think this is one of those cases where we might actually need a config.
I've always run my crons on a separate server. Don't know how many people do that but if they do, there's no issue.
As in, crons don't get web traffic

zstein Today at 12:35 PM
I think it's generally a best practice to do so
but not everyone does that

Josh Dickerson Today at 12:35 PM
yeah for sure

zstein Today at 12:35 PM
admin + crons on the same server

Josh Dickerson Today at 12:35 PM
yeah, in that case there's no issue because the admin area doesn't use flat tables.
So, I am going more towards this needs to be a config to fix. "Disable flat tables during indexing"
Or enable, depending on how you want to word it. Enable allows it to be empty which would be the default now.
Then move the locks to the db. That's for sure.

Oct 01 '20 16:10 joshua-bn

I'm really surprised if it really takes 8ms to open the lock file, but it seems the bigger issue is that it is not locked effectively in a multi-node environment.

One option would be to use the cache backend to handle the locking since it should be assumed that it is multi-node compatible, but that is susceptible to all kinds of edge cases like handling PHP fatal errors or interrupted processes.

Another option would be MySQL's GET_LOCK() and IS_USED_LOCK() which I see no major problems with and these should be very fast. They probably didn't use these to begin with because they are specific to MySQL but OpenMage isn't interested in supporting anything other than MySQL-flavored databases (e.g. no Oracle, Postgres, Microsoft) that would support these functions.

Oct 02 '20 20:10 colinmollenhour

there would also be an option to use the magento Flag Table to store the Locks there

Oct 03 '20 07:10 Flyingmana

I'm really surprised if it really takes 8ms to open the lock file

It's a lot of traffic trying to get a lock on a single file (this was for a category).

There was more discussion on Discord that is useful here:

jfksk 10/01/2020 isn't it checking the process status anyway? I was under the impression that this is the reason why the whole thing works. Removing the lock check then should result in just checking the availability in the DB o0

Josh Dickerson 10/01/2020 Yes. It checks the process status.

    public function isAccessible()
    {
        if (is_null($this->_isAccessible)) {
            $this->_isAccessible = $this->isEnabled()
                && $this->getProcess()->getStatus() != Mage_Index_Model_Process::STATUS_RUNNING;
        }
        return $this->_isAccessible;
    }

$this->_isAvailable = $this->isAccessible() && !$this->getProcess()->isLocked(); so it is checking the process status AND if it is locked. Good point. Another reason we don't need the lock.

zstein 10/01/2020 So as a consequence removing the file check would basically not change the behavior anyway, but would slightly improve performance and may even lead to better behavior when something gets hung up in indexing

Josh Dickerson 10/01/2020 Looking to see what they did in M2

zstein 10/01/2020 M2 no longer recommends a best practice using flat product and flat category because of weirdness in the process... https://docs.magento.com/user-guide/catalog/catalog-flat.html

jfksk 10/01/2020 hmm, but the status is not changed w/ update-on-save. so in worst case it changes the lock situation on the flat table - i guess.

zstein 10/01/2020 So you may not find a better answer there, just fyi

Josh Dickerson 10/01/2020 @zstein I know. I pushed hard for them to do that. They still have flat tables in the codebase though it changes the lock situation on the flat table @jfksk what do you mean? Mage_Index_Model_Process::lock() is called from Mage_Index_Model_Process::safeProcessEvent() which is called from Mage_Index_Model_Indexer::indexEvent() which is called from Mage_Index_Model_Indexer::processEntityAction(). processEntityAction() is called from a lot of places. Main one being afterCommitCallback() for category and product models

jfksk 10/01/2020 @Josh Dickerson Just guessing here, but depending on your catalog update freq. and your trx. isolation level you could see more lock waits on the flat table because it receives more reads. Mage_Index_Model_Process::safeProcessEvent is currently blocking them (if it works...) But if, so (and you notice it), that's prob. an edge-case. If you are now running a cluster w/o replacing the locking, than I don't see what would change when the FS lock is removed for the FE.

Josh Dickerson 10/01/2020 I think so long as the flat table is built and you're using InnoDB and your tx isolation level is set to the one that doesn't over isolate, you should be good no matter what.

Oct 03 '20 13:10 joshua-bn

is here reason to not using cache here?

Oct 03 '20 16:10 henrykbrzoska

Nice find @joshua-bn 👍 I like Colin's idea - M2 also use Locking functions from MySQL per default. https://github.com/magento/magento2/tree/2.4-develop/lib/internal/Magento/Framework/Lock

Oct 05 '20 08:10 spinsch

I am in favor of not locking and either removing the "feature" altogether or adding a config. If we "lock" this, it will not use flat tables and use the EAV tables. I think fixing this bug will actually break a lot of sites that are relying on flat tables all the time.

Oct 05 '20 13:10 joshua-bn

the main problem the locking solves are incomplete or non-existing flat tables. I have seen "table not found" issues in production environments, because the locking did not work for non-cron servers.

Also cache is for such a thing not good, as caches can be cleaned at any time.

Oct 05 '20 14:10 Flyingmana

@spinsch I didn't see that they actually lock anything though.

@Flyingmana that's why I think a config here would be best.

Oct 05 '20 14:10 joshua-bn

a config which makes the used locking configurable sounds good for me, then people can decide if they use "file", "none", and we can add other/better ones later, and still have full backwards compatibility

Oct 05 '20 15:10 Flyingmana

I've the feeling that the lock might just hide more - deeper - problems in the process. E.g. w/ Mage_Catalog_Model_Resource_Product_Flat_Indexer::prepareFlatTable Currently the update case seems to be broken and it's always updating the whole table. Anyway, IMHO it should use a shadow table (like also M2 does) instead and rename the table after the index was successfully generated. RENAME TABLE is a atomic process and therefore it should work perfectly fine w/ replication. Doesn't that also remove the need for a lock in FE? Not sure if I'm missing something...

Oct 05 '20 16:10 jfksk

Great comments, everyone!

Replacing file locks with MySQL lock function has no major downsides in my view and would be very easy to implement as a drop-in replacement with no BC break.
If the indexing process was updated to use a shadow table that would remove the need for checking lock status in frontend requests, but I think the lock might still be useful to avoid backend updates from conflicting. Not 100% sure on this.
It seems this issue should be possible to solve without introducing new configuration options which then need to be announced, documented, etc. unless there is a really good reason.
It does seem that the frontend shouldn't have to check for both the index status and the lock status. That is, if the indexing is in process then the flat tables won't be used anyway so why check the lock status? This alone would be an easy fix, improve performance and not have any BC concerns. Unless I'm missing something..
Keep in mind behavior when the PHP process fails (out of memory, killed by user, segfault, etc). I'd like to avoid using methods that don't automatically clean up after themselves (using core_flag table or cache backend).

Oct 05 '20 16:10 colinmollenhour

I think the shadow table design is a big feature that should not be part of this. Would be a great feature though.
Everything needed to use the database is already there. Just change the call to Mage_Index_Model_Lock::isLockExists() to false as the second param. No other changes necessary there
"It does seem that the frontend shouldn't have to check for both the index status and the lock status" yeah, seems like another easy solution. I think it needs to be properly researched, but it looks good from the surface.
The config suggestion is because it will be a change for either multi-server or single-server environments. One way or another, it's a big change.

More than likely I'll remove this functionality on my site as I can't see any reason we will need it. So, I won't be creating a PR for this. Going to ask the community to take this.

Oct 05 '20 16:10 joshua-bn

If the indexing process was updated to use a shadow table that would remove the need for checking lock status in frontend requests, but I think the lock might still be useful to avoid backend updates from conflicting. Not 100% sure on this.

Of course, yes, the lock would still be required for any modification of the data. Full reindex and update-on-save.

I think the shadow table design is a big feature that should not be part of this. Would be a great feature though.

I mean, it can be done w/o changing the whole indexer-backend. In fact I did it yesterday for a client, after I saw the problems in prepareFlatTable . We will test that in the next days or so. The problem is, its not particularly nice and a bit hacky but its transparent and integrates w/ the existing process. Not sure if its worth a PR. Maybe I come back with a gist or so to discuss it.

Everything needed to use the database is already there. Just change the call to Mage_Index_Model_Lock::isLockExists() to false as the second param. No other changes necessary there

The config suggestion is because it will be a change for either multi-server or single-server environments. One way or another, it's a big change.

The indexer should prob. just honor the existing configuration and the FS lock should be an adapter. This way one could use whatever locking-mechanism suits best. There is IMHO no real easy answer for multi-server env. as possibly nothing works o-t-f. MySQL's *_LOCK functions for e.g. will give you a warning when used in replication. And there IMHO lays also a BC concern w/ the situation...

<config>
    <global>
        <index>
            <lock>
                <storage>
                    <model>index/lock_storage_db</model>
                </storage>
            </lock>
        </index>
    <global>
</config>

If there is a concern w/ ignoreing the lock for reads in FE, I'd rather would fix the reason for it (e.g. shadow table and what not). A new configuration, frankly, would be a bit strange...

Oct 05 '20 17:10 jfksk

MySQL's *_LOCK functions for e.g. will give you a warning when used in replication

Can you provide some more info on this? I know for multi-master replication the *_LOCK functions are not going to provide distributed locking, but we've established before that Magento is already not suitable for multi-master databases.

Oct 05 '20 20:10 colinmollenhour

@colinmollenhour yikes, my statement is invalid. :facepalm: FTR: Long story short: As said, synchronous repl. like XtraDB Cluster and Galera cannot handle these locks. But how do asynchronous replications with rbr (sbr doesn't work anyway) - it doesn't really matter... You should be fine as you are probably splitting to not spread your indexer qry's all over the place and the resource uses the write adapter.

Another pitfall: It looks like you want to use MySQL 5.7+ for this feature or it wont properly work.

SELECT GET_LOCK('lock1',10);
SELECT GET_LOCK('lock2',10);
SELECT RELEASE_LOCK('lock2');
SELECT RELEASE_LOCK('lock1');

In MySQL 5.7 or later, the second GET_LOCK() acquires a second lock and both RELEASE_LOCK() calls return 1 (success). Before MySQL 5.7, the second GET_LOCK() releases the first lock ('lock1') and the second RELEASE_LOCK() returns NULL (failure) because there is no 'lock1' to release.

https://dev.mysql.com/doc/refman/5.7/en/locking-functions.html

And now I think here is why it was changed w/ 1.9.4.

Oct 05 '20 22:10 jfksk

I haven't confirmed either way, does Magento indexer currently acquire multiple locks simultaneously? If not I think this fact is not a blocker.

If it does, I still don't think we should consider it a blocker because 1) it won't completely break in MySQL <5.7, just the locking will be ineffective (as it is currently on multi-node systems without var/lock mounted on NFS) and 2) everyone should be using 5.7 or 8 by now.. MySQL 5.6 is still supported by Oracle for only a few more months but 5.7 was released 5 years ago and I think it's about time to upgrade the minimum requirement but we can save that for a separate discussion.. :)

Oct 06 '20 18:10 colinmollenhour

As said, synchronous repl. like XtraDB Cluster and Galera cannot handle these locks.

If you are using these with all writes going to one node then these functions still work. Basically there is just no centralized locking but it works perfectly fine on any single node.

Oct 06 '20 18:10 colinmollenhour

I haven't confirmed either way, does Magento indexer currently acquire multiple locks simultaneously? If not I think this fact is not a blocker.

Yes, you can start a index from shell and in the backend (or just in different browsers) and update on save is also concurrent.

it won't completely break in MySQL <5.7

Yes, it might not, because the index status and will protect you in most cases but it also wont work as expected because any process would remove the lock of any other process if you are running MySQL 5.6 - or am I missing something here? This can get nasty. Back w/ 1.5 (maybe 1.6, I never looked back after implementing a workaround) I had loads of trouble with indexer concurrency.

I think it's about time to upgrade the minimum requirements

:thumbsup:

as it is currently on multi-node systems without var/lock mounted on NFS

I am not sure about that. It very much depends on your setup. I guess, that most who are running Magento in a multi node env, do that with a dedicated node for cron/api/admin-panel. In that case the FS lock might be just fine for index operations - even if its not on NFS. I am not sure about the lock check in FE because I never experienced any problem with it. At least, the behavior wouldn't be random.

If you are using these with all writes going to one node then these functions still work. Basically there is just no centralized locking but it works perfectly fine on any single node.

Yes, w/ Percona at least, if you set the PXC mode to PERMISSIVE. You get then a warning (that's were I was comming from ;) ). Is there any other way w/ PXC? One of my go-to hosters is understandably strictly opposed to do that.

Oct 06 '20 19:10 jfksk

Yes, you can start a index from shell and in the backend (or just in different browsers) and update on save is also concurrent.

I think you are misunderstanding how these functions are limited in MySQL 5.6 and earlier. They can only hold one lock per-connection. Any other connections calling GET_LOCK does not break the first connection's lock so the only case where this would be bad news is if a single Magento process was trying to obtain multiple locks at a time. This would be bad practice anyway because then it introduces the likelihood of having a deadlock and this would not be detected by the server so would literally leave your locks deadlocked indefinitely. I am assuming (based on reading of indexer code long ago) that Magento locks each indexer only serially in which case I don't think this limitation will have any effect on Magento.

Oct 07 '20 09:10 colinmollenhour

Yes, w/ Percona at least, if you set the PXC mode to PERMISSIVE. You get then a warning (that's were I was comming from ;) ). Is there any other way w/ PXC?

Ahh, thanks, I did not know that.. That seems like a faulty design of Percona because there is nothing wrong with using these functions as long as you understand and accept that they have absolutely zero effect on other nodes. Just because XtraDb Cluster is a multi-master design doesn't mean it has to be used strictly as multi-master.. Oh well. Can you just ignore the log warnings?

Oct 07 '20 09:10 colinmollenhour

I think you are misunderstanding how these functions are limited in MySQL 5.6 and earlier. They can only hold one lock per-connection. [snip]

Yes, I totally missed that.

I am assuming (based on reading of indexer code long ago) that Magento locks each indexer only serially

I just tested it, and reindexEverything is processing and acquiring the locks in serial.

So there seems to be no problem :smiley:

Oct 07 '20 16:10 jfksk

Can you just ignore the log warnings?

I only run it in testing, and there seemed to be no problem but I never fully validated that.

Oct 07 '20 16:10 jfksk

Cool, thanks for verifying!

Oct 07 '20 16:10 colinmollenhour

Just found this issue https://github.com/OpenMage/magento-lts/issues/1150 which is related.

Oct 19 '20 19:10 joshua-bn

Locally I changed all of the calls to isAvailable() to isAccessible() and moved the isAdmin() call to the front wherever it does If Flat Data enabled then use it but only on frontend. For good measure, I also changed the isLockExists() call to remove the true from the second parameter.

Oct 19 '20 20:10 joshua-bn

Just went live with some of these changes. You may see more deadlocks on the index_process table now. This is actually expected because the table is actually being used now. This just highlights the issue with locking for indexing though.

Dec 15 '20 19:12 joshua-bn

I removed the locking well over a year ago on a production environment with more traffic and catalog updates than any M1 site you'll find and there have been no issues.

Feb 04 '22 04:02 joshua-bn

@joshua-bn can you push a PR with your changes?

Feb 04 '22 08:02 tmotyl

I've created https://github.com/OpenMage/magento-lts/pull/2492 trying to solve this.

Aug 24 '22 16:08 fballiano

magento-lts magento-lts copied to clipboard

File IO for every category and product collection

Summary (*)

Examples (*)

Proposed solution

magento-lts
magento-lts copied to clipboard