poedit icon indicating copy to clipboard operation
poedit copied to clipboard

PO files may be silently read partially if modified in the middle of reading

Open HadrienG2 opened this issue 7 months ago • 3 comments

I am experimenting with an inotify-based workflow in which updates to source files automatically lead to an update of the .pot template, which then propagate to individual .po files via msgmerge. This is pretty convenient, but there is one issue: for a fully seamless experience, Poedit should handle these background updates when it already has this file opened, allowing me to easily switch between original file editing and translation editing without closing and reopening Poedit constantly.

Poedit actually already partially handles this, but does so imperfectly. If I do remember to save my Poedit changes before working on the original text (which I accept as a technical prerequisite because merging two changesets is complicated), it goes as follows. As the file is updated in the background, Poedit mostly takes care of reloading it automagically. But it will occasionally complain with an error/warning dialog, that is usually harmless but not always, for reasons that I will explain next.

The problem is that Poedit reacts to the first background file write notification that it receives, and does not keep listening to all further change notifications as it reloads the file. As a result, it will often start reloading the partially-written updated PO file before msgmerge is done writing it, and will not reload it again afterwards because it missed the subsequent file update notifications. This will result in Poedit only observing the beginning of the file, not the end of it, which will then result in either PO parsing errors or (much worse) a silent discarding of previous translations that I may not notice. If I proceed to save the PO file in this state, uncommitted translations will be lost, which is bad.

I can think of a few ways in which Poedit could handle this situation better:

  1. Listen to file write notifications in a background thread, and send them to the main thread via some kind of queue as they come. The main thread checks this queue like it would otherwise watch file write notifications, and once it has new entries, it fetches all of them, while ignoring all but the latest update to each file, before taking action. In this way, Poedit can be sure that it always eventually ends up reloading the final version of the file, at the expense of possibly reloading a few non-final versions before that (which may cause errors, that should ideally not be shown to the user right away if more updates to the same file are pending).
    • To avoid crowding a queue with lots of file changes notifications from the same file that will eventually be discarded, a superior concurrent data structure would be some kind of mutex-protected hash-set of files that have been modified, which the main thread swaps with an empty hash set upon readout. But this will require more work on your side as it is a less common way to synchronize threads, and I suspect for typical use cases a queue will work well enough.
  2. Set up a timer based on some reasonable expectations of how long it should take msgmerge to update a file + some desired UX response time criteria like a typical screen frame duration (16.7ms for a 60Hz monitor). When a file write is detected, Poedit start this timer, and only proceeds to reload the file when this duration is elapsed. This will reduce the number of unnecessary reloads, and also decrease the odds that Poedit observes a partially written file. But it may not take the chance of partial write observation to zero if the file is updated unusually slowly for some reason, so some variation of strategy 1 is still needed.
  3. Do not listen to file write notifications at all, instead only listen to notifications about a file being closed after being previously opened for writing. This should more reliably ensure an optimal single-reload behavior than the above option. But it does so at the expense of assuming a "well behaved" file write pattern like that of msgmerge. For example, manual PO file editing using a text editor will not be handled correctly.

What do you think?

HadrienG2 avatar May 14 '25 08:05 HadrienG2

I can think of a few ways in which Poedit could handle this situation better:

Why don't you submit a PR? This armchair theorizing about what should be done (without realizing Poedit already does that) down to irrelevant low-level details or naive theories about "reasonable expectations of file-writing times" are not very useful when you didn't look at the code in question yet...

The actual issue is that

  1. The PO file is modified non-atomically in your case (and extremely frequently, apparently)
  2. As you point out, Poedit watches for modification operations and doesn't/can't for sudden lack of modifications
  3. Poedit watches for changes to files it has open in the UI, so there's a race condition for the duration of loading

No 1 immediately suggests a workaround: produce a temporary file, use mv to replace the current one.

No 2 is can be fixed by optimistically loading files as now, but doing that in a loop until the file's mtime doesn't change during loading (which it almost never will). No 3 is going to be tricky to get right, but squarely in "nice to have" category.

vslavik avatar May 15 '25 14:05 vslavik

First of all, apologies for the incorrect speculation. I normally try to avoid those, but this time I gave in to the temptation due to a mixture of time pressure and anger at the lost hours of work.

Sadly, for the same time pressure reasons, I don't think I'll be able to work on such a patch for a long while.

Given that this is...

  • An unfamiliar, respectably large codebase
  • For a GUI, which I don't do often these days and remember from past experience to be somewhat painful to test/get to a correct state
  • Involving a framework I'm not used to (I've only done Delphi/Qt/GTK before and that was all >13 years ago)
  • With multiplatform intents (whereas I only have 1 platform with ready-to-use C++ dev environments at easy reach)
  • And all about a bug that involves a filesystem race, which like all races is likely to be a fun debugging/regression-testing ride

...I'd be surprised if I can roll a correct fix in less than 2 weeks of continuous work, even more if the work has to be intermittent due to the all the brain context switching costs. Whereas I'm under high work pressure until at least July, and probably more beyond that as lots of stuff is piling up in the "after the July rush" bucket. So, if I'm the one to do it, it won't be until summer, probably later.


Meanwhile, though, we can discuss the proper fix if you like, as your atomic write comment gave me an idea.

In the short term, I can indeed fix the problem for myself by making my auto-update shell script a hair more complicated (make msgmerge write to a copy of the original .po file instead of updating it in-place, then atomically replace original with mv after msgmerge is done).

This will not fix the problem for other people, however, who are likely to run into this again in the future given that msgmerge is a basic gettext utility and from a quick look at its source code, its file write pattern is very much not atomic (lots and lots of fine-grained writes in there).

Which is where I have a question for you. Since you wrote Poedit, I can only assume you have quite a bit of familiarity with gettext tooling. Which is not the case for me, I picked it up this week. So in your experience, how likely are other gettext users to automatically update .po files with something other than msgmerge while Poedit is running?

  • If msgmerge is the de-facto standard and people rarely use anything else (beyond translation tools like Poedit of course, but two of these are unlikely to be opened concurrently), then I suspect it will be a lot easier to fix this at the msgmerge level, than make Poedit handle non-atomic writes correctly. In that case I can get in touch with the gettext people and see how open they would be to an atomic file write patch.
  • OTOH if there are many other .po update tools that we could reasonably expect people to run while Poedit is active, then fixing the problem on that side may not be the easiest approach, and making Poedit handle non-atomic file updates correctly may be quicker overall.

HadrienG2 avatar May 15 '25 20:05 HadrienG2

I normally try to avoid those

Yet here we go again...

Given that this is...

...with some BS excuses why you can't contribute...

Meanwhile, though, we can discuss the proper fix if you like, as your atomic write comment gave me an idea.

Sorry, but "discussing proper fix" with somebody who can't even be bothered to give the code a cursory look, has no value. PRs do have value. Even bug reports - sans armchair lectures how to write the code you're not willing to write yourself - have. But talking - not really.

This will not fix the problem for other people, however, who are likely to run into this again in the future

In the 4+ years this code exists, one person (you) encountered an issue. This is the classic fallacy of thinking that what you do is normal, but in reality, ~nobody else is "experimenting with an inotify-based workflow".

then I suspect it will be a lot easier to fix this at the msgmerge

There's nothing to "fix" in msgmerge.

I'm leaving this issue open, because there is (albeit low-impact) problem in the way files are loaded specifically for "streamable" formats like PO, but unless you have a patch, there really isn't anything left to comment about.

vslavik avatar May 20 '25 16:05 vslavik