omegat icon indicating copy to clipboard operation
omegat copied to clipboard

Add options to choose export TM folder and which TMs to export

Open npizzigati opened this issue 3 years ago • 16 comments

In Project Properties dialog, allow user to select folder where translation memories will be exported when translated documents are created, as well as which TMs will be exported (OmegaT, TMX Level 1, and/or TMX Level 2).

Also:

  • Tests added for new functionality
  • Project Properties documentation updated and new html documentation files generated

Implements: RFE 1611, RFE 1591

npizzigati avatar Mar 02 '22 01:03 npizzigati

Please don't change docs/ and propose only for doc_src. docs/ will be managed by docs maintainer.

miurahr avatar Mar 21 '22 04:03 miurahr

Question 1: How this configuration behave in team project?

When project admin change option and push changes in team repository, does the configuration affect to team members?

Question 2: Do you accept an Absolute Path for configuration?

When project admin change option to point, for example, "D:\data" on Windows, how it affects to team members who works on macOS?

miurahr avatar Mar 21 '22 04:03 miurahr

Thanks for the great comments and questions, Hiroshi. I will be looking into them and will follow up in the next couple weeks.

npizzigati avatar Mar 23 '22 21:03 npizzigati

Squashed and merged into master

miurahr avatar Mar 25 '22 17:03 miurahr

@miurahr, the [master] link above points at a 404 and the code does not seem to be in master. Do you have an idea what happened?

brandelune avatar Apr 19 '22 14:04 brandelune

Here is raw hash c94b2d5f0dd9bfcd06539cb5f05653b14b779f22 but it is not related here but for #218.

I might close here by my mistake. I'm so sorry. @brandelune thank you for finding it.

miurahr avatar Apr 19 '22 22:04 miurahr

Thank you ! Is there anything to do here to have the code in master ?

brandelune avatar Apr 19 '22 23:04 brandelune

Thank you ! Is there anything to do here to have the code in master ?

We need more test cases;

  1. when opening a project created in previous versions, does migration do properly?
  2. when user specified full-path that is outside of project folder, what happened?
  3. when team properties ( has | does not have) a tm folder property, what happened?

We should also define a behavior when team project when the property is not _DEFAULT_

miurahr avatar Apr 20 '22 00:04 miurahr

Question 1: How this configuration behave in team project?

When project admin change option and push changes in team repository, does the configuration affect to team members?

Question 2: Do you accept an Absolute Path for configuration?

When project admin change option to point, for example, "D:\data" on Windows, how it affects to team members who works on macOS?

I think the option should be translator's private preference rather than project property. When set it in OmegaT preference, it will be saved in $HOME/.omegat/omegat.prefs An UI should be in src/org/omegat/gui/preferences/views/ rather than changing gui/dialogs/ProjectProperties.java When set it as user preference, there is also no change in project properties file.

miurahr avatar Apr 20 '22 02:04 miurahr

I think the option should be translator's private preference rather than project property.

That makes perfect sense. Thank you, Hiroshi. I will rework this and submit a new pull request when I'm done. I think this pull request can be closed.

npizzigati avatar Apr 20 '22 12:04 npizzigati

There is another issue for the preference. User's preference is common among projects. It may be fine for selecting levels of exported TMs, but an absolute path of target directory depends on each projects.

I have an idea that you add checkbox to allow users to custom TM export. When user check customize the export, checkbox for levels configuration are activated. We can add more checkbox to allow users to use custom common export directory where is out of project folder. When user select it, then we activate a input field of a common folder.

When customize the export folder, user can see their exports in something like /Users/translator/custom/export/folder/(project-name)-level1.tmx When user has multiple project that has same project name, then file will be overwritten.

miurahr avatar Apr 21 '22 00:04 miurahr

I don't think there is a necessity to over-design the feature. As it is, it is designed to work exactly as the other project defined paths where generated files are overwritten every time they are generated.

The feature here is only about:

  • possibly grouping the exported TMX into a folder that is not the project folder itself (that was an historical design decision that we've had to live with for 2 decades now, and it is neither practical nor consistent with the general behavior)
  • selecting the exported file that the project needs

The contents of the exported data is project dependent, not user dependent. And each project should be able to define export paths depending on the project architecture requirements.

The feature is not about defining a global TMX export folder, just like OmegaT does not have a global source/ folder or a global target/ folder. It is about defining a project specific TMX export folder.

brandelune avatar Apr 21 '22 00:04 brandelune

@brandelune thank you for the clarification.

The feature here is only about:

  • possibly grouping the exported TMX into a folder that is not the project folder itself (that was an historical design decision that we've had to live with for 2 decades now, and it is neither practical nor consistent with the general behavior)
  • selecting the exported file that the project needs

The contents of the exported data is project dependent, not user dependent. And each project should be able to define export paths depending on the project architecture requirements.

How does a team feature treat the configuration?

  • plan A: a configuration is team common. Only allow relative path from project root.
  • plan B: a configuration is translator local, like local mapping. team feature ignore remote configuration.

I think there can be a mixture of two.

  • plan C: when remote configuration exist, all local configuration is override. otherwise respect local configuration.

miurahr avatar Apr 21 '22 02:04 miurahr

To give some background, a few months ago I suggested @npizzigati to develop this feature the way the other project folders were implemented. But it's his code and his feature so in the end it's his call. I will not interfere in design decisions.

So, as I just stated, my original idea was to implement this path selection exactly as the other project folder paths were implemented, while keeping the default as they are right now.

In light of this, I think that this property should be considered exactly as the other properties regardless of whether the project is local or remote. I hope that answers your question (as far as my comments were concerned).

brandelune avatar Apr 21 '22 02:04 brandelune

Thanks to both of you for this productive discussion. Based on the points brought up, I think we should go ahead and put the options in Project Properties. @miurahr, in regards to your earlier questions:

Question 1: How this configuration behave in team project? When project admin change option and push changes in team repository, does the configuration affect to team members?

Yes, both the location of the TM export folder and the option of which TMs to export affects team members.

Question 2: Do you accept an Absolute Path for configuration? When project admin change option to point, for example, "D:\data" on Windows, how it affects to team members who works on macOS?

Yes, an absolute path is accepted. An absolute path of "D:\data" in a master project created on Windows will become an absolute path of "/data" for team members on Linux (directory will not be automatically created, but team member can create it). For a master project created on Linux, an absolute path of "/media/disk" will become an absolute path of "C:\media\disk" for team members on Windows (directory is automatically created in this case).

Note that the above behavior is exactly the same as the behavior of the other directories set in Project Properties. With regard to testing on a Mac, unfortunately I don't have access to one, but I would expect the directories to behave the same there, too.

npizzigati avatar Apr 21 '22 17:04 npizzigati

@miurahr, in response to your other questions:

We need more test cases;

1. when opening a project created in previous versions, does migration do properly?

I think the test testMissingDirs (with the addition I have made on line 377) covers this. I have also tested this manually and migration works properly (the new fields are added with their defaults).

2. when user specified full-path that is outside of project folder, what happened?

Absolute and relative path behavior is tested in tests testNearAbsolutePaths, testNearRelativePaths, testFarAbsolutePaths and testFarRelativePaths. I have also confirmed manually that full paths outside of the project folder, even on different volumes, work in group projects just as they do for the other Project Properties directories (see above comment on team project behavior for more about this).

3. when team properties ( has | does not have) a tm folder property, what happened?

Team projects without TM export folder or TM export options properties behave like local legacy projects -- the new fields are added with their defaults on the team member's machine.

npizzigati avatar Apr 21 '22 20:04 npizzigati

I'd like to merge.

miurahr avatar Aug 19 '22 06:08 miurahr

Sorry for the delay, I was in holidays and did not see that I had received this request for review.

First about the feature itself: exactly as @miurahr I would have thought that these parameters would be user-dependent rather than project-dependent. But maybe I don't understand what is exactly the use case: I thought the goal would be to create a common directory where every project is exported (eventually with subdirectory per project). Maybe I am wrong: in which case do you need to export TMX in a non-standard location, project dependent?

Now technically, globally I approve the method since it is exactly as for other project-dependent folders. The only thing I would eventually change: for the choice of 3 booleans (whenever to create this file or not) I would use an enum, for example ExportedTmxLevel, instead of strings, except of course in the XML file itself (and use enum parsing to get them from the project xml file). That would be consistent with existing code, which already contains lot of such enums. That makes more visible the fact that property export_tm_levels does not contain arbitrary string, but only a known set of possible values. And since enums are also classes, maybe we could define some properties to factorize the code in RealProject.java to a for-each loop.

t-cordonnier avatar Aug 19 '22 07:08 t-cordonnier

First about the feature itself: exactly as @miurahr I would have thought that these parameters would be user-dependent rather than project-dependent. But maybe I don't understand what is exactly the use case: I thought the goal would be to create a common directory where every project is exported (eventually with subdirectory per project). Maybe I am wrong: in which case do you need to export TMX in a non-standard location, project dependent?

Thank you Thomas for your comments.

The use case is simple. Freelancers commonly work with different clients and need per-project settings for such data, just like project_save.tmx is project specific and not shared by all the projects a user has.

The idea here is to keep this function as close as possible to the other project specific options, where the user decides on a project to project basis where to take source/ from, where to put target/, and here where to export the TMs.

I hope that answers your question.

Now technically, globally I approve the method since it is exactly as for other project-dependent folders. The only thing I would eventually change: for the choice of 3 booleans (whenever to create this file or not) I would use an enum, for example ExportedTmxLevel, instead of strings, except of course in the XML file itself (and use enum parsing to get them from the project xml file). That would be consistent with existing code, which already contains lot of such enums. That makes more visible the fact that property export_tm_levels does not contain arbitrary string, but only a known set of possible values. And since enums are also classes, maybe we could define some properties to factorize the code in RealProject.java to a for-each loop.

I leave that part to Nick to answer and conclude.

brandelune avatar Aug 19 '22 09:08 brandelune

First about the feature itself: exactly as @miurahr I would have thought that these parameters would be user-dependent rather than project-dependent. But maybe I don't understand what is exactly the use case:

I think I can give an example of why this feature (at least the part about which TMs to export, as requested in ticket RFF 1591) must be project-specific and not user-dependent:

If I want to be able to decide which TMs are exported for a job where I dispatch OmegaT projects to my translators, that should affect only those OmegaT projects and not other projects that I might also dispatch to the same translators but with different requirements or other projects that the translators might create themselves.

Therefore, I would need to tweak this option in the project settings of the projects that I'm dispatching, and not in the user preferences of my translators (with an impact on all their projects).

msoutopico avatar Aug 19 '22 12:08 msoutopico

But maybe I don't understand what is exactly the use case:

One thing that is confusing (at least for me) is that this PR is bundling together what are actually two different things: choosing folder where to put exported TMs is one feature, choosing TMs to export is another feature, and the two features are related by independent from each other. I think having two PR's would be much clearer.

msoutopico avatar Aug 19 '22 12:08 msoutopico

I'm excited about getting to work on this again!

On the technical recommendation:

for the choice of 3 booleans (whenever to create this file or not) I would use an enum, for example ExportedTmxLevel, instead of strings, except of course in the XML file itself (and use enum parsing to get them from the project xml file).

Point taken. I will get to work on it.

On the matter of use cases and separating this into two separate PRs, I see your point, @msoutopico. I understand that your use case doesn't necessarily require setting a different folder for the export TMs, and that is a strong case for separating that part of this into its own PR.

At the same time, I also agree with @brandelune's point that the export TM folder option is consistent with how OmegaT handles other project folders, and, in my opinion, this makes it a natural and almost inevitable extension of the feature to choose export TMs.

On that basis, I would recommend against separating the PR.

npizzigati avatar Aug 20 '22 03:08 npizzigati

The use case is simple. Freelancers commonly work with different clients and need per-project settings for such data, just like project_save.tmx is project specific and not shared by all the projects a user has. The idea here is to keep this function as close as possible to the other project specific options, where the user decides on a project to project basis where to take source/ from, where to put target/, and here where to export the TMs.

@brandelune Thank you for the explanation.

I have an idea to simplify the feature.

  1. Change a default folder to target for these exports
  2. Change a default behavior is no export of these files.
  3. Add a feature to select which export TMX is enabled.

These configuration are project-oriented.

Translator can commit target files into repository for outcome of the work. It may be an option to add menu item to save export of TMX file as similar as saving target files.

And as advanced feature,

  1. User can select arbitrary export folder.

I feel it is not mandatory for the use case. Translators can copy files to place they want.

miurahr avatar Aug 20 '22 06:08 miurahr

I think the defaults should be the current behavior, and maybe change the default in the next version with a deprecation warning in the doc for ex.

I like the idea of having target/ as a default export location but I think we should have at least level2 exported by default and not nothing.

brandelune avatar Aug 20 '22 06:08 brandelune

Also, I'd like to see this feature in 5.8 if possible. See my last message on the development list.

brandelune avatar Aug 20 '22 08:08 brandelune

Also, I'd like to see this feature in 5.8 if possible. See my last message on the development list.

brandelune avatar Aug 20 '22 08:08 brandelune

Do I understand correctly that the target folder is proposed as the default location for the export TMs?

msoutopico avatar Aug 20 '22 11:08 msoutopico

No. The default is proposed to be the current behavior.

target/ could be a default export location in future versions if somebody wants to take care of the coding. And it would make some sense, since the TM contents reflect the contents of the target files.

In a workflow that I have for a client, where I deliver both target and level2, that would help a lot.

brandelune avatar Aug 20 '22 11:08 brandelune

Thanks. It would be great if the target folder could be selected as the location where export TMs are generated, but I think it would be a bad idea to make it the default, especially if it becomes a forced default that can't be changed.

I think that default would be confusing for some users who would be confused about what are the deliverable files they must hand back to the client or open for preview (and this is not speculation, it's based on my experience with subcontractors, not necessarily very clumsy ones).

In fact, my ticket to add an option in the project settings to choose which TMs to export (all 3, 2, 1, or just none) is meant to help those users (like those subcontractors) who only need to care about the target documents and nothing else.

I think the design principle by which the contents of the target folder are an exact replica of the contents of the source folder (except for the translatable text, of course) is a good one. I would advocate for sticking to it as a default behaviour.

In a nutshell: I'm totally fine with having the current behaviour as the default of new features and enhancenents. Thanks!

msoutopico avatar Aug 20 '22 11:08 msoutopico

Freelancers commonly work with different clients and need per-project settings for such data, If I want to be able to decide which TMs are exported for a job where I dispatch OmegaT projects to my translators,

OK I see. I did not see this as an "export" feature. Don't forget that it is executed when you run "create translated files": I have users who run this menu regularly during translation to have a look to rendering of native translated files ; if these files (translated, but also the exported TMs) are produced in the middle of a project and exported immediately outside the project, then the targetted user (or even worse, a post-processing tool monitoring the directory) may take the files without knowing that the project is not finished. For these reasons, I would not have implemented export feature here, but either in a dedicated menu, a script or an external workflow tool (i.e. outside OmegaT).

One thing that is confusing [...] choosing TMs to export is another feature,

One other point confusing is the term "choosing TMs to export". Generally when I read "which TM" I think that they differ by the contents (segments), but here it is not the case: In fact all these 3 files contain exactly the same segments but differ by the format (all are TMX but differ by options: with or without tags, and some other details). The correct term should be "which format" (or format options if you prefer) to export. In previous message I suggested to use enum, but I forgot to precise that this option is only valid if we consider that the specification of these 3 files is carved in stone. It is the case actually (and in this PR, we define them in a set of checkboxes) but another possible approach could be to produce only one file (I doubt there are lot of clients which need the same data in two different formats, and even if it is the case, stylesheets can be used for conversion) and have instead a set of options: with or without tags, with or without orphans, etc. and use a bitfield instead of an enum: then it is easy to have up to 63 boolean options (size of long int minus the sign bit)

t-cordonnier avatar Aug 20 '22 15:08 t-cordonnier