lmms icon indicating copy to clipboard operation
lmms copied to clipboard

Refactor ``DataFile`` upgrade routines

Open consolegrl opened this issue 1 year ago • 24 comments

Initial proof-of-concept for refactoring DataFile upgrade routines per #6861 . Once the style and approach are approved, I'll move ahead with refactoring the other routines. The here-fixed routines have been tested on the legacy projects bundled with LMMS sources.

consolegrl avatar Sep 14 '23 08:09 consolegrl

@sakertooth @zonkmachine @Veratil have a look at this approach, if you like it I will move on to refactor and test the other routines.

consolegrl avatar Sep 14 '23 08:09 consolegrl

As per @michaelgregorius suggestion in #6861 I have pushed a rough draft of a file-per-routine "Upgrade Routine Functor" restructuring as well as templates to assist others in writing upgrade routine functors.

This is a rough draft and as such I have not gone over the ergonomics with a fine tooth comb, but I am open to comments and review. @DomClark and @Veratil and @sakertooth what do you think about this?

I will also be rebasing the UpgradeExtendedNoteRange on my work based on @michaelgregorius suggestion.

consolegrl avatar Sep 27 '23 19:09 consolegrl

  1. I like splitting it up like this
  2. I don't like it flooding include/ and src/core/ with even more files, I'd rather they go into their own directory

Veratil avatar Sep 27 '23 21:09 Veratil

2. I'd rather they go into their own directory

@Veratil what directory name would you prefer?

consolegrl avatar Sep 27 '23 21:09 consolegrl

  1. I'd rather they go into their own directory

@Veratil what directory name would you prefer?

include/core/datafile/ and src/core/datafile/ seems good to me. This would require a change to CMakeList.txt to do a GLOB_RECURSE though for the include directory. Which this then opens up reorganizing the include directory into the same structure as src.

Veratil avatar Sep 27 '23 21:09 Veratil

Which this then opens up reorganizing the include directory into the same structure as src.

That sounds...fun. Not difficult, but sounds frought with merge conflicts and stuff. ...Also don't know how to do that in cmake

consolegrl avatar Sep 27 '23 22:09 consolegrl

but sounds frought with merge conflicts and stuff.

Fun fact: it will most definitely for any PR before that change would be made 😂

...Also don't know how to do that in cmake

Fun fact 2: I just shared how to do it and the line to change 😉 Fun fact 2.a: I've already tested that change with my midi rewrite branch and it works for new files at least 😁

Veratil avatar Sep 27 '23 22:09 Veratil

Fun fact 2.b: It was so simple I figured it out on my own just poking around. just unfamiliar.

it will most definitely for any PR before that change would be made 😂

sounds like a good way to make enemies or hurt the project :(

consolegrl avatar Sep 27 '23 22:09 consolegrl

It won't hurt, it's a good step forward in better file organization.

Veratil avatar Sep 27 '23 22:09 Veratil

@Veratil it is done. Now my new enemies shall emerge from the shadows and drag me down to Davy Jones.

consolegrl avatar Sep 27 '23 22:09 consolegrl

I always assumed that the root include directory was meant as the public interface to LMMS. Which is also the reason why I had put the include of my conversion class into the src folder.

Is this assumption correct? If yes, then the data conversion routine should go somewhere else in the long run because they are just headers for a private implementation detail.

michaelgregorius avatar Sep 28 '23 16:09 michaelgregorius

As far as I can tell, include is a dumping ground of all header files regardless.

Veratil avatar Sep 28 '23 16:09 Veratil

I always assumed that the root include directory was meant as the public interface to LMMS.

I've only been with the project for a short time, [but I gotta tell ya, these things are REAL -Winston Zedamore] but it seams to be wrong. They seem to just use the include directory for all .h files as there are no first-party .h files in src/ (only under 3rd-party)

consolegrl avatar Sep 28 '23 16:09 consolegrl

Thanks, @Veratil! Good to know. I guess it will stay like that because if they were ever moved next to their implementation files like in other projects then this would create even more enemies. 😅

michaelgregorius avatar Sep 28 '23 18:09 michaelgregorius

Thoughts:

  • I like the new inheritance hierarchy, where we have a base class DataFileUpgrade and specific upgrades can inherit that.
  • If we go with this approach:
    • I understand that we would probably want separate files. Each upgrade can vary in size and the number of functions it needs, so it makes sense. We seem to already do this for UpgradeExtendedNoteRange anyways.
    • I would add a DataFileUpgrades.h file in the include directory that includes all the necessary upgrades, which is then included within DataFile.h.
    • Can we remove the *.template files? Some of the instructions seem off too (e.g., LMMS_UPGRADEMIXERRENAME_H should really be LMMS_UPGRADE_MIXER_RENAME_H)
    • The plan is to move all the upgrades into their own files/modules, correct? We aren't changing anything about the current functions, just moving them, so it should be fine.

I will maybe do a thorough review on the specifics of the code itself soon.

sakertooth avatar Sep 30 '23 23:09 sakertooth

  • Can we remove the *.template files? Some of the instructions seem off too (e.g., LMMS_UPGRADEMIXERRENAME_H should really be LMMS_UPGRADE_MIXER_RENAME_H)

@sakertooth I think the templates will help other people adding upgrade methods and keep things consistent as I've seen no less than three "here's my PR with a side of upgrade routine" PR's in the short time I've been here. Also I was simply following the naming convention already in place, if you want me to change the convention that's fine too. Perhaps we could put them in another directory? or just rename them?

  • The plan is to move all the upgrades into their own files/modules, correct? We aren't changing anything about the current functions, just moving them, so it should be fine.
  • I would add a DataFileUpgrades.h file in the include directory that includes all the necessary upgrades, which is then included within DataFile.h.

Yes, I haven't even touched the pre 1.3 upgrade methods as per @Veratil request earlier with concerns about testing etc. But that would be the idea. If I can get the ergonmics small/smooth enough we can shove/hide the small routines (like my mixer rename fix) into DataFileUpgrades.h/cpp to save some file names and, although they are all hidden in include/datafile . It will also leave the base class room to grow.

  • I understand that we would probably want separate files. Each upgrade can vary in size and the number of functions it needs, so it makes sense. We seem to already do this for UpgradeExtendedNoteRange anyways.

Yeah, in fact the author of that class was the impetus for this change/new restructuring.

consolegrl avatar Oct 01 '23 14:10 consolegrl

@sakertooth @Veratil @michaelgregorius So the new method makes the order explicit, doesn't run any routines that don't need to be run (like the old way), uses simple templates, splits massive upgrade routines (upgrade_1_3_0) into their own class/functor files, is easy to read and comes with a template file for newbies to add their own upgrade routines as necessary.

The next phase of this work will be to get proper tests of each of the routines, which means recreating ancient files to test each routine, or one file to test them all.

consolegrl avatar Oct 12 '23 03:10 consolegrl

Thoughts and my design idea (subject to change, feel free to let me know what you think):

  • UpgradeContainer would be an abstract class that contains mainly a pure virtual upgrades() function. Inherited subclasses should define this function to return the collection of upgrades, ordered from the least to the greatest in terms of their versions. They do this by returning a list of function pointers.
  • UpgradeContainer can also have helper functions that may be used within the upgrades in the subclasses themselves.
  • DataFile contains a list of UpgradeContainer's, and it too has to be in the correct order from least to greatest.
  • DataFile::upgrade will then "flatten" these UpgradeContainer's, yielding something similar to the current UPGRADE_METHODS vector we have now. This function can then continue upgrading the project like before.

This helps in terms of:

  1. Making the upgrades more cohesive and contained in separate files, helping to give some structure and organization for upgrades.
  2. Therefore also reducing the bloat in the DataFile class.
  3. This design should also be able to support nested UpgradeContainer's, which can help when you want to put the big upgrade functions into their own class and then tack on any other remaining upgrades at the end, but still wish for the whole thing to be contained.

I would suggest having UpgradeContainer's for each minor version, which the DataFile class will use. If you want, you can split big upgrades, but I would recommend putting them in their folder so that it's clear that they belong to the same upgrade.

Example (not tested):

class UpgradeContiner
{
public:
	UpgradeContainer(QDomDocument* document);
	using Upgrade = std::function<void(void)>;
	auto upgrades() const -> std::vector<Upgrade> = 0;
	//... any helper functions that may be used within subclasses
protected:
	QDomDocument* m_document;
};

class UpgradeContainerA : public UpgradeContainer
{
public:
	auto upgrades() const override -> std::vector<Upgrade>
	{
		return {&upgradeA, &upgradeB, &upgradeC};
	}
private:
	void upgradeA();
	void upgradeB();
	void upgradeC();
};

PS: I still don't think the template files are necessary. The header file should give a good idea for how the upgrade files should be made (i.e., by making a subclass and choosing how you want to structure your upgrades). Everything else related to stuff like header guards are just conventions we should be using, and it's not directly correlated to the upgrade files themselves.

sakertooth avatar Oct 13 '23 21:10 sakertooth

I like the simplicity of the DataFileUpgrade class. It's straight forward and provides useful helper methods. It should even be possible to implement the UpgradeContainer schema with it if an upgrade wants to organize it's steps using function objects.

But is that really needed often though? Even if an upgrade can be divided into several steps then I would find it cleaner to just implement the virtual upgrade method by calling the private methods one after the other, especially if the have "speaking" names. Using lists of lists of std::function seems a bit over-engineered to me as it uses too many (technical) abstractions and the responsibilities seem to be too distributed in my opinion. The upgrade classes should just do the upgrade instead of providing their "innards" to someone else to do them.

michaelgregorius avatar Oct 13 '23 21:10 michaelgregorius

I like the simplicity of the DataFileUpgrade class. It's straight forward and provides useful helper methods. It should even be possible to implement the UpgradeContainer schema with it if an upgrade wants to organize it's steps using function objects.

The problem is that it's too much boilerplate to have to define a whole new class that only contains a simple upgrade function that runs only a few lines of code.

For instance, take the Upgrade04 file. We can turn this:

class UpgradeTo0_4_0_20080104 : public DataFileUpgrade
{
public:
	UpgradeTo0_4_0_20080104(DataFile& document) : DataFileUpgrade(document) {}

	void upgrade() override;
};

/*
 * Upgrade to 0.4.0-20080118
 *
 * Upgrade to version 0.4.0-20080118 from some version greater than
 * or equal to 0.4.0-20080104
 */
class UpgradeTo0_4_0_20080118 : public DataFileUpgrade
{
public:
	UpgradeTo0_4_0_20080118(DataFile& document) : DataFileUpgrade(document) {}

	void upgrade() override;
};

/*
 * Upgrade to 0.4.0-20080129
 *
 * Upgrade to version 0.4.0-20080129 from some version greater than
 * or equal to 0.4.0-20080118
 */
class UpgradeTo0_4_0_20080129 : public DataFileUpgrade
{
public:
	UpgradeTo0_4_0_20080129(DataFile& document) : DataFileUpgrade(document) {}

	void upgrade() override;
};

/*
 * Upgrade to 0.4.0-20080409
 *
 * Upgrade to version 0.4.0-20080409 from some version greater than
 * or equal to 0.4.0-20080129
 */
class UpgradeTo0_4_0_20080409 : public DataFileUpgrade
{
public:
	UpgradeTo0_4_0_20080409(DataFile& document) : DataFileUpgrade(document) {}

	void upgrade() override;
};

/*
 * Upgrade to 0.4.0-20080607
 *
 * Upgrade to version 0.4.0-20080607 from some version greater than
 * or equal to 0.3.0-20080409
 */
class UpgradeTo0_4_0_20080607 : public DataFileUpgrade
{
public:
	UpgradeTo0_4_0_20080607(DataFile& document) : DataFileUpgrade(document) {}

	void upgrade() override;
};

/*
 * Upgrade to 0.4.0-20080622
 *
 * Upgrade to version 0.4.0-20080622 from some version greater than
 * or equal to 0.3.0-20080607
 */
class UpgradeTo0_4_0_20080622 : public DataFileUpgrade
{
public:
	UpgradeTo0_4_0_20080622(DataFile& document) : DataFileUpgrade(document) {}

	void upgrade() override;
};

/*
 * Upgrade to 0.4.0-beta1
 *
 * Upgrade to version 0.4.0-beta1 from some version greater than
 * or equal to 0.4.0-20080622
 * convert binary effect-key-blobs to XML
 */
class UpgradeTo0_4_0_beta1 : public DataFileUpgrade
{
public:
	UpgradeTo0_4_0_beta1(DataFile& document) : DataFileUpgrade(document) {}

	void upgrade() override;
};

/*
 * Upgrade to 0.4.0-rc2
 *
 * Upgrade to version 0.4.0-rc2 from some version greater than
 * or equal to 0.4.0-beta1
 */
class UpgradeTo0_4_0_rc2 : public DataFileUpgrade
{
public:
	UpgradeTo0_4_0_rc2(DataFile& document) : DataFileUpgrade(document) {}

	void upgrade() override;
};

Into this:

class Upgrade04 : pubic class UpgradeContainer
{
// ...
private:
        void upgrade_0_4_0_20080104();
	void upgrade_0_4_0_20080118();
	void upgrade_0_4_0_20080129();
	void upgrade_0_4_0_20080409();
	void upgrade_0_4_0_20080607();
	void upgrade_0_4_0_20080622();
	void upgrade_0_4_0_beta1();
	void upgrade_0_4_0_rc2();
}

This helps to move out the upgrade_0_4_0_* functions out of DataFile and into it's own container class, making that class less bloated and more clean.

But is that really needed often though? Even if an upgrade can be divided into several steps then I would find it cleaner to just implement the virtual upgrade method by calling the private methods one after the other, especially if the have "speaking" names. Using lists of lists of std::function seems a bit over-engineered to me as it uses too many (technical) abstractions and the responsibilities seem to be too distributed in my opinion. The upgrade classes should just do the upgrade instead of providing their "innards" to someone else to do them.

I see your point. We can always allow that instead, that is, for UpgradeContainer's to have an upgrade function that runs the upgrades up to a specific point or all of them. I just figured it would be easier to flatten out everything at the end, but we don't really have to do it this way.

The overall goal is to find a way to simplify the DataFile class, and add helper methods to simplify the upgrades themselves. However, I rather not make the end result be more bloated, where we have a lot of classes that don't really have to be separate classes.

Edit: We do not even have to go with the UpgradeContainer idea. We just need these upgrades contained in some kind of space like a separate file, but not in a way that's more bloated than before. We could use a namespace, say datafile, and then just put the upgrades in separate files that way.

sakertooth avatar Oct 13 '23 21:10 sakertooth

The problem is that it's too much boilerplate to have to define a whole new class that only contains a simple upgrade function that runs only a few lines of code.

Most of these old classes will never be touched anyway but would still be organized in a concise way, i.e. they would do exactly one upgrade step from one version to the next.

[...] Into this:

class Upgrade04 : pubic class UpgradeContainer
{
// ...
private:
        void upgrade_0_4_0_20080104();
	void upgrade_0_4_0_20080118();
	void upgrade_0_4_0_20080129();
	void upgrade_0_4_0_20080409();
	void upgrade_0_4_0_20080607();
	void upgrade_0_4_0_20080622();
	void upgrade_0_4_0_beta1();
	void upgrade_0_4_0_rc2();
}

But how would the user of this class, i.e. DataFile, know where to start in the list of the returned functions? Let's say my stored file is in the state of upgrade_0_4_0_20080409. How would the framework know that it has to start at upgrade_0_4_0_20080607()? My understanding is that the list of upgrades, i.e. the set of individual classes, represent legitimate starting points or file states for upgrades.

michaelgregorius avatar Oct 13 '23 22:10 michaelgregorius

But how would the user of this class, i.e. DataFile, know where to start in the list of the returned functions? Let's say my stored file is in the state of upgrade_0_4_0_20080409. How would the framework know that it has to start at upgrade_0_4_0_20080607()? My understanding is that the list of upgrades, i.e. the set of individual classes, represent legitimate starting points or file states for upgrades.

To be completely honest, I see your point. I feel like if we had to delegate UpgradeContainter to do the upgrades, it would be very complicated, which is why I proposed flattening it in DataFile::upgrade, but that can seem like a responsibility issue between the classes. It would also be hard to answer the question "what does it mean to upgrade a project using a UpgradeContainer?")

The goal for this PR it seems, as previously mentioned, is to just reduce the size of the DataFile class, and introduce helper methods that future upgrades can use to produce simpler code. I just think we can solve those problems without having to define a new class for each upgrade, which I find to be unnecessarily verbose.

Perhaps we can still put the upgrade functions in their own files, but put each file under a datafile namespace? Maybe something like:

namespace lmms::datafile::v04
{
        void upgrade_0_4_0_20080104();
	void upgrade_0_4_0_20080118();
	void upgrade_0_4_0_20080129();
	void upgrade_0_4_0_20080409();
	void upgrade_0_4_0_20080607();
	void upgrade_0_4_0_20080622();
	void upgrade_0_4_0_beta1();
	void upgrade_0_4_0_rc2();
}

And then it can be easily referenced in DataFile, but we still achieve the nice code split.

sakertooth avatar Oct 13 '23 22:10 sakertooth

The advantage of a base class is that the code already contains the full context of the upgrade, i.e. the XML data as well as all available helper methods. If you implement an upgrade in this context you for example don't have to pass around the data as a parameter to some other "loose" functions. You can just use it because the data that's being worked on is a member of the base class because it is meaningful in the upgrade context.

IMHO it structures the code nicely and makes it easier to understand what functionality is already there. As mentioned above it could also have a textual description of the upgrade. It would all be in one place and it would also define a clean interface between the upgrade functionality and the framework that manages applying the updates according to version information.

I would not mind about a few more lines that I'd have to write. The upgrades are relatively slowly evolving code anyway and I personally would set the focus on maintainability, understandability and comfort rather than minimal lines of code.

michaelgregorius avatar Oct 14 '23 07:10 michaelgregorius

  • DataFile::upgrade will then "flatten" these UpgradeContainer's, yielding something similar to the current UPGRADE_METHODS vector we have now. This function can then continue upgrading the project like before.

This 'list of function pointers' is fundamentally incompatible with the use of functors (function objects). C++ doesn't allow us to take pointers to bound methods, i.e. member functions of an object. Some upgrade procedures, such as UpgradeTo1_3_0 require a massive amount of state and nesting which is a clear code smell that screams "I should be a functor".

The problem is that it's too much boilerplate to have to define a whole new class that only contains a simple upgrade function that runs only a few lines of code.

I agree that some of these small procedures have a boilerplate ratio that is too high, so maybe we can combine them into sub-procedures of a single functor, one for each minor version, such as UpgradeTo04. However, in that case, they would need to some how account for their version number internally which is not really that hard, but it decentralizes the list of version numbers which may confuse those that come after, shouldn't be too bad though. My concept for this would prob be, something like

in DataFile.h

	template<unsigned int i, unsigned int j, class Up>
	void upgrade() {
		if (m_fileVersion >= i && m_fileVersion <= j) {
			Up{*this}(m_fileVersion);
		}
	}

in DataFile.cpp

upgrade<0, 5, UpgradeTo04>();
upgrade<6, 9, UpgradeTo05>();

in the minor version class

class UpgradeTo04 : public DataFileUpgrade {
   upgrade (unsigned int v) override {
      if (v <= 0) { upgrade_0_4_0_20080104(); }
      if (v <= 1) { upgrade_0_4_0_20080118(); }
      ...etc...
   }
}

This way each sub-procedure can use the minor-version object to store or share state data during an upgrade should it be required (such as in the case of UpgradeTo1_3_0)

I don't love the 'if (v <= ...)' mechanism, doesn't feel clean, but it's not wrong or bad either. This is ofc a mockup and I'll implement it with actual names if you like the idea.

consolegrl avatar Oct 26 '23 18:10 consolegrl

Closing this since the author has taken some time off from this project, and it doesn't seem like much progress will be made. If there are objections to closing, I would suggest merging at the time @DomClark approved it. At that point, the PR was much simpler and was a safer improvement. Further improvements to DataFile can come later if necessary.

sakertooth avatar Jul 05 '24 00:07 sakertooth