filetree icon indicating copy to clipboard operation
filetree copied to clipboard

FileTree export can be improve to reduce the noise

Open jecisc opened this issue 7 years ago • 3 comments

When I review code on github I often see files with modification but no content added.

For example:

  • https://github.com/npasserini/iceberg/pull/268/files/483dd146005a7cbf26f19a9cc0cf958db9f8d41a#diff-bec44d0483e6ef1f62be1bff61483d8b
  • https://github.com/npasserini/iceberg/pull/268/files/483dd146005a7cbf26f19a9cc0cf958db9f8d41a#diff-5900259ea5dc8183b58d1e138765d725

I would like to see two things to improve it:

  • The exporter could avoid to rewrite metadata without changes
  • The exporter could avoid to change the order of metadata between two export

What do you think?

jecisc avatar Feb 02 '17 20:02 jecisc

Hi @jecisc, it looks like the use of two slightly different json exporters producing different text outputs for the same data. You have to look into how Iceberg is writing on disk and what is happening with the different backends Iceberg has.

This issue ties in #186 which was discussed about a year ago with @npasserini. It is about making a better FileTree writer that changes only files that needs to be changed, by saving a diff instead of a package. I even created a branch for that ( issue_186).

ThierryGoubier avatar Feb 03 '17 15:02 ThierryGoubier

Hi!

I think it would indeed be good. I would like to help but I will not have the time before month. :(

jecisc avatar Feb 03 '17 17:02 jecisc

No need to hurry. The gain could be very important for large packages (it could make writing a new version to disk much faster, and git operations faster too) but it could be difficult to do a proper diff in the first place (do the diff in-memory for example and you risk missing changes done on-disk and mess everything).

ThierryGoubier avatar Feb 03 '17 19:02 ThierryGoubier