OpenQL icon indicating copy to clipboard operation
OpenQL copied to clipboard

Restructuring

Open jvanstraten opened this issue 4 years ago • 6 comments

Abusing the issue system for a restructuring proposal, which is a bit of a dependency for freezing the modular interface, since the naming inside and outside of OpenQL would preferably be and remain the same.

C++ namespace tree

(root)                          - Only SWIG sources, if that's even necessary (never tried). NOTHING else. Macros
 |                                sort of live here, so all macros need a QL_ prefix.
 '- ql                          - Main namespace for OpenQL. Everything lives in here, but nothing should live in
     |                            here directly.
     |- utils                   - Utilities: extensions to (standard) libraries, wrappers, etc. not specific to
     |                            OpenQL or compilers in any way.
     |- plat                    - Platform: data structures and management of target platform description.
     |                            Includes resource management and non-arch-specific resources, gate
     |                            classes/semantics etc. Configured by platform configuration file.
     |- ir                      - Internal representation: data structures representing an algorithm within the
     |                            context of some platform. "Configured" through Python API or a cQASM file.
     |- com                     - Common: OpenQL/compiler-specific code reusable for various passes. For example
     |                            DFG or CFG construction might live here.
     |- driv                    - Driver: pass management and other orchestration code. Configured by compiler
     |                            configuration file.
     |- pass                    - Passes: passes that are applicable to any platform.
     |   '- <category>          - Pass category namespaces.
     |       '- <pass>          - Namespace for each pass.
     '- arch                    - Architecture-specific code.
         '- <name>              - Architecture name.
             |- plat            - Architecture-specific platform configuration logic or data structures, such as
             |                    architecture-specific resources or platform configuration desugaring.
             '- pass            - Architecture-specific passes.
                 '- <category>  - Pass category namespaces.
                     '- <pass>  - Namespace for each pass.

...
 '- tests                       - Any namespace may have a "tests" namespace for unit tests.

This is a rather nested structure, but I would always argue that it's better to use more categories and do some extra typing now, than it is to have to revise the structure again later because things exploded into a million files again.

using namespace only allowed in C++ files or private header files (in the src directory). You can abbreviate types and namespaces within your own namespace to save typing, however. Also, you should always use relative names when possible. I keep seeing ql:: in the code, which makes no sense, because everything is already in the ql namespace.

Directory tree and #include

Sources and private header files go in src, public header files go in include. Within these directories, the subdirectories mimic the namespace trees.

A public header file is any header file that either is itself an API header or a header #include'd by an API header. That's almost all the current header files, aside from one of the visualizer headers I think. Private header files are exclusively #included by source files.

The distinguishment and file structure is important: if we ever want to be able to install OpenQL as a proper library, we can't just spam something like ir.h in /usr/local/include, which would be necessary with the current structure. With the suggested structure,

  • only the contents of include are installed in /usr/local/include (or whatever else the install prefix is) instead of also private source files, and
  • due to the namespace structure, everything be namespaced within the ql subdirectory.

That means you'd get /usr/local/include/ql/ir.h, which is fine (as long as ql is unique; it's a bit short, but fine).

Code should use the #include "ql/...h" syntax. Relative paths are also okay but frowned upon. #include <ql/...> must NEVER be used, as it would prefer the installed version of OpenQL (if any) over the source tree you're trying to build; <> syntax is for the standard library and external libraries only.

Ideally, a header file defines only one (big) class, preferably named the same as the class it defined but lower_case. If you need more than one class to do something, split the classes up into multiple files. Preferably, implementation detail classes should be used such that they don't need to be defined in public header files; this reduces compilation time and strengthens abstraction. You can do this by forward-declaring a class using just class Name; and wrapping them in Ptr or Opt when used as a private class member in a public class.

Pass naming and driver semantics

Passes are categorized using the following intentionally short names, reflected in the namespace tree.

pass                - All namespaces that define passes are inside a namespace named "pass".
 |- pre             - Preprocessing: passes that operate on the platform description rather than on the
 |                    algorithm, for instance to error-check or desugar platform-specific stuff.
 |- io              - Input/output: I/O passes that load the IR from a file or save (parts of the IR) to a file
 |                    without significant transformation. Mostly cQASM, but would also include conversion of the
 |                    IR to different formats (OpenQASM? QuantumSim?).
 |- ana             - Analysis: passes that leave the IR and platform as is (save for annotations, once those
 |                    become a thing), and only analyze the content of the platform/IR. For example statistics
 |                    reporting, visualization, error checking, consistency checking for debugging, etc.
 |- dec             - Decomposition: passes that decompose code (instructions, gates, etc) to more primitive
 |                    components or otherwise lower algorithm abstraction level. Includes of course gate
 |                    decomposition passes, but something like reduction of structured control flow to only
 |                    labels and goto's would also go here.
 |- map             - Mapping: passes that map qubits or classical storage elements to something closer to
 |                    hardware. Right now that would be "the mapper" (shouldn't that be called qmap?), but would
 |                    also include a hypothetical pass that automatically applies some error correction code to
 |                    the user-specified algorithm, mapping variables to classical registers and memory,
 |                    reduction to single-static-assignment form, etc.
 |- opt             - Optimization: optimization passes, i.e. passes that do not lower the abstraction level of
 |                    algorithms, but instead transmute them to a more efficient representation with the same
 |                    behavior.
 |- sch             - Scheduling: passes that shuffle instructions around and add timing information.
 |- gen             - Code generation: passes that internally convert the common IR into their own IR to reduce
 |                    it further, to eventually generate architecture-specific assembly or machine code.
 |- misc            - Any passes that don't fit in the above categories, for example a Python pass wrapper if
 |                    we ever make one (which could logically be any kind of pass).
 '- dnu             - Do Not Use: code exists only for compatibility purposes, only works in very particular
     |                cases, is generally unfinished, or is so old that we're not sure if it even works anymore.
     '- io..misc    - The other categories reappear as namespaces within dnu.

Within this, each pass has its own namespace with the name of the pass, named using lower_case (like all other namespaces). In this, the pass class is defined as UpperCasePass, where UpperCase is the same set of words as the namespace, with a using Pass = UpperCasePass typedef for just Pass as well.

The name of the pass should be in verb/"do" form, for example "Optimize" or "Map". This is both shorter and more generically usable than the "doer" noun alternative form that's also used (and was suggested by Hans earlier). I tried coming up with names for all passes in "doer" form, but that required quite a bit of creativity, because for example "CCLPrepCodeGeneration" has no proper "doer" form ("CodeGenerationPreparer"? That's not a word) eventually became "ConsistencyCheck" because it was the best I could come up with.

Pass classes derive from one of:

  • driv::PreprocessingPass: should override run(Ptr<Platform>), allowing the platform to be modified.
  • driv::TransformationPass: should override run(Ptr<const Platform>, Ptr<Program>) or run(Ptr<const Platform>, Ptr<Kernel>), allowing the program or kernel to be modified while keeping the platform constant.
  • driv::AnalysisPass: should override run(Ptr<Platform>, Ptr<const Program>) or run(Ptr<const Platform>, Ptr<const Kernel>), allowing access to the program or kernel and platform, keeping both constant.
  • driv::PassGroup: should override get_passes() -> List<Ptr<driv::AbstractPass>> to get the list of sub-passes it expands to, based only on options (not on platform).

These are thin shells around driv::AbstractPass, which defines the common get_passes() and run() entry points for the driver, handles pass option management, pass-specific logging facilities, etc.

Instead of logging macros, move toward using logging objects, that allow scoped specification of loglevels, allow teeing to files, etc. They may also gather timing information based on users entering and exiting logging scopes. The pass classes will provide a root object for this, preconfigured using common pass options.

Pass implementations have to be registered with the driver such that it can match the pass type names to the respective classes. Eventually this should be done automatically using a simple code generator, but for they should be registered within the constructor of Driver. Registration includes stringified documentation that can be printed from Python. The type names must be the same as the namespace inside pass, but using . instead of :: and UpperCase for the final namespace (i.e. the pass name) to be more familiar for Python devs (it follows Python naming conventions, where modules are lower_case and classes are UpperCase) and to be slightly shorter. For example, the resource-constrained scheduler would in C++ have the full name ql::pass::sch::rc_scheduler::RcSchedulerPass, abbreviated to ql::pass::sch::rc_scheduler::Pass via the typedef, and would have the full name sch.RcSchedule in Python.

However, the driver does the following desugaring/checking for names when passes are added:

  • The dnu namespace gets special treatment. Instead of being visible with the dnu. prefix, do-not-use passes must be individually enabled using a global option. Once enabled, they are pulled into the non-dnu namespace, potentially overriding a normal pass with the same name. For instance, dnu::opt::optimize_rotation would be simply opt.OptimizeRotation once enabled. That way we can keep either an older/deprecated version of a pass in dnu in case someone still wants to use it, or experiment with a new version of it, without affecting code that uses the recommended version.
  • The arch::<name> namespaces also get special treatment: based on an "architecture" key in the driver configuration file, the architecture-specific passes for the selected architecture are pulled into the toplevel namespace as well, potentially overriding generic passes with the same name. This saves a bit of typing and allows architectures to (temporarily) override generic passes to add features they need until these are merged into the generic passes (again, without needing to change configuration files when this is done). However, arch.<name>. prefixes can also be used.

The API for the driver/pass manager/compiler, whatever you want to call it, should include methods for inspecting which passes exist (along with documentation of what they do), their options, etc., and methods to introspect or procedurally modify the pass list (actually a tree due to PassGroups). For example, it should be easy to wrap one or more pass instances in a group that includes cQASM writers before and after the target passes.

The driver ensures that preprocessing passes come before any other passes, but all bets are off otherwise.

Driver interface (without const& C++ stuff):

  • template <class Pass> static void register_pass(Str type_name): registers a pass class.
  • static Bool pass_type_exists(Str target): whether there is a pass with the given instance name.
  • static Map<Str, Str> list_pass_types(): returns a map from all registered pass types not marked as do-not-use to descriptions of their functionality.
  • static Map<Str, Str> list_pass_options(Str type_name): returns a map from available option names for the given pass type to descriptions of their functionality.
  • Driver(): create a driver with an empty pass list.
  • Driver(Str filename): create a driver with a pass list from a configuration file.
  • PassRef append_pass(Str type_name, Str instance_name="", Map<Str, Str> options={}): appends a pass to the end of the pass list.
  • PassRef insert_pass_before(Str target, Str type_name, Str instance_name="", Map<Str, Str> options={}): inserts a pass immediately before the given pass (named by instance).
  • PassRef insert_pass_after(Str target, Str type_name, Str instance_name="", Map<Str, Str> options={}): inserts a pass immediately after the given pass (named by instance).
  • PassRef get_pass(Str target): returns a reference to the pass with the given instance name.
  • Bool pass_exists(Str target): whether there is a pass with the given instance name.
  • List<PassRef> list_passes(): lists all current passes.
  • remove_pass(Str target): removes the pass with the given instance name.
  • remove_all_passes(): removes all passes.
  • set_option(Str target, Str option, Str value): sets an option for a pass, referred by name.
  • compile(Ptr<Platform>, Ptr<Program>): compiles a program using the pass list.

PassRef interface:

  • contains a private Ptr<AbstractPass> member to safely refer to a pass without ownership problems
  • Str get_type(): returns the type name of the pass.
  • Str get_name(): returns the instance name of the pass.
  • void set_option(const Str &option, const Str &option): returns the current value of an option.
  • PassRef with_option(const Str &option, const Str &option): as above, but returns itself to allow usage of the builder pattern.
  • const Option &get_option(const Str &option): returns the current value of an option.
  • void dump_options(bool only_set = false, std::ostream &os = std::cout): writes the current values of all options or only those that were explicitly set to the given stream.
  • void help(std::ostream &os = std::cout): writes a description of the pass and its options to the given stream.

Inconsistencies/problems

  • In this model, the platform is modified or possibly even constructed during compilation. That doesn't mesh with needing the platform to even start making a program/kernel in the API.
  • The PassGroup thing doesn't work if it's allowed to programmatically make passes based on its settings, while there are also pass list introspection/modification methods before compilation, and options can be set after construction. One of these has to go (at least for pass groups) or an alternative solution is needed.
  • It'd be quite elegant if Driver (or whatever it would then be called) implements PassGroup, to make a properly recursive structure. But above needs to be solved first.

jvanstraten avatar Mar 26 '21 16:03 jvanstraten

I mainly like this proposal, but would like to make a few notes:

  • to me the namespace hierarchy may lead to overly verbose code (like the ir:: we already have) and I'm not sure whether we need all the protection within what's basically a single project. Also see https://abseil.io/tips/130
  • the name driv for "driver" is heavily overloaded, so I'd suggest "passMngr" or something like that
  • apart from adding proper logging, I think we also need a proper mechanism for reporting errors to the user (and share that with libqasm)
  • currently we have a separate .JSON file to configure the passes, I would prefer to be able to put this information in the main .JSON

wvlothuizen avatar Mar 29 '21 19:03 wvlothuizen

to me the namespace hierarchy may lead to overly verbose code (like the ir:: we already have) and I'm not sure whether we need all the protection within what's basically a single project. Also see https://abseil.io/tips/130

  • We've had the problem before where two people independently made a class with the same name in different header files that never included each other, causing a weird naming conflict after linking that broke things spectacularly with no good way to tell why.
  • Since people in the quantum world seem adamant about using standard quantum gate names in programming, ql::t was literally the type for a T gate IIRC. I'd rather have that be in ql::ir::gates::T; if someone really wants to just use T they can write using namespace ql::ir::gates in whatever scope they want to use it in. See also the original cQASM spec, where people decided to make b, h, i, q, s, t, x, y, and z case-insensitive keywords, because who uses those letters for anything else, right?
  • Once everything has its own namespace, names actually become shorter, because I don't really care what aliases you make within a private namespace. If you want to pull ql::utils and ql::ir (or preferably the relevant parts of those rather than the whole thing) into ql::arch:cc via your types_cc.h file then go for it, but I don't want them littered all over ql, or worse, the global namespace due to overzealous use of using namespace. Furthermore, because you can be selective in what you pull into your namespace, you can actually use shorter names. For example, rather than CCLightDecomposePreSchedule you might have arch::cc_light::dec::PreSchedule, which is about as long, except if you're in a context where you would want to make it shorter, you can either use a namespace alias for the pass or a type alias. I use this all over the place in libqasm, and IMO it's much cleaner than the alternative.
  • This is more personal preference than anything else, but in general I much prefer long descriptive names over cryptic abbreviations or other short names. Sure, with an IDE you can ctrl+click to the definition and hope the type explains what it is and/or there's a comment, but with an IDE like CLion you can also tab-complete after like two or three characters (thanks, machine learning!), so I don't really see what the problem is. Condensed code might be slightly easier to write, but I find verbose code much easier to read/understand.
  • tree-gen and libqasm heavily rely on nested namespaces for the tree structures, so if we would decide not to use nested namespaces I would basically have to rewrite those again. (Granted, this isn't really an argument as to whether it's right or not.)

As for the problem abseil sketches, I'm going to need an example for why this is a problem in practice, because I think I'm missing the point. If I understand them correctly, their argument against nested namespaces is that if someone defines std::unique_ptr or whatever inside the nested namespace (or something else that's using namespace'd into the current scope) then it will use that one rather than ::std::unique_ptr. But the whole point is that no one else should be touching the nested namespace you're in. I would actually argue that this is more likely to happen when you just have a single, in the end shorter root namespace. If their point is that a malicious user can use this to break things, well, so can #define true false, and if their point is that an oblivious user can break things by reusing common names for namespaces somehow, then I raise them reusing common type names, because the latter doesn't even break until after successful runtime linking the compiler-generated prologues and epilogues start allocating the wrong amount of bytes on the stack (this was the symptom of the problem I was referring to that actually happened). C++ in general lets you break everything implicitly on a whim unless you know exactly what you're doing. I would then rather have a clear structure that I might break than the lack of a structure with everything on the same pile that I might break.

So unless I'm missing something shockingly fundamental (which I might), I would argue that this is simply a matter of taste, except:

  • as mentioned, we've had problems before with name conflicts caused by everything being in ql (or actually in the global namespace back then, due to various using namespace ql statements in headers); and
  • once refactoring is complete, the nested namespace style doesn't preclude locally writing in the non-nested style (by pulling everything you need into the nested namespace and then forgetting about it), but the other way around doesn't work.

the name driv for "driver" is heavily overloaded, so I'd suggest "passMngr" or something like that

I'm not sure why this is a problem per above. If you mean that someone might make a variable named driv, that just shadows the namespace without causing issues outside the scope it's declared in. I'm also not really sure what driv would be commonly used for other than as an abbreviation for driver.

apart from adding proper logging, I think we also need a proper mechanism for reporting errors to the user (and share that with libqasm)

Beyond an error loglevel for recoverable things and exceptions for unrecoverable things? In case of libqasm, it keeps track of errors in a simple string vector and fails if it's nonempty after compilation.

currently we have a separate .JSON file to configure the passes, I would prefer to be able to put this information in the main .JSON

I personally don't really care where what is, but we have settled on this within the compiler team already;

  • they describe very different things: one is a description of a piece of hardware, the other is a description of a strategy for getting there; and
  • the pass list might change based on user preferences (especially after all the desugaring is implemented that would allow you to do something akin to "optimize with level 3" instead of needing to know what the actual passes are), while users should ideally not be touching the platform unless the actual chip changes (users as in normal compiler users, not experimentalists who are still trying to figure out how their chip behaves and actually want an assembler rather than a compiler). The platform JSON file should not be something that everyone needs to spam their own copy of.

Actually, if the keys in the outer objects are designed not to conflict, I see no reason why you wouldn't be able to define both platform and compiler configuration in the same file. You'd just be loading the same file in both contexts. Although that would have to become a properly documented use case then, because I want to at some point make it so that unrecognized keys in the JSON file cause errors rather than that they're silently ignored.

But ultimately I'm not sure why you want everything in the same file.

jvanstraten avatar Mar 29 '21 22:03 jvanstraten

namespace:

you convinced me :-)

driver:

I don't foresee any C+ problems here, but only that the term 'driver' is rather implicit, you cannot tell from its name what it is driving

error reporting:

we currently don't have a real distinction between errors in the user-provided input (program, .JSON) and internal errors. As an end user, it can be very difficult to interpret errors and solve them. I introduced QL_JSON_FATAL as a sort of a first step to at least convey that a problem is in the JSON. And it would also make sense to return warnings that relate to user input. Maybe we should just use stdout for things returned to the user, and stderr for internal trouble, but I don't know whether that works with Python.

JSON:

the 'problem' with having the extra file is that it introduces an implicit extra dependency, and that it precludes having several configuration files in a single directory that use a different pass setup (not sure whether that makes sense, and of course you use separate directories). So not a big deal, but having a single file that contains the full picture is easier to manage.

wvlothuizen avatar Mar 30 '21 08:03 wvlothuizen

I don't foresee any C+ problems here, but only that the term 'driver' is rather implicit, you cannot tell from its name what it is driving

I was under the impression that "driver" is pretty well-accepted terminology within a compiler, but I guess that's just me (Hans also voiced concerns/confusion about its naming). If we do go for the PassManager name as it is right now I would use pm for the namespace though. I like to make acronyms about 4 characters in length, so it's on the short side, but pass_mgr or pass_mngr or something is quite long for a namespace IMO (namespaces are typically lower_case).

ETA: or pmgr, actually. That'd work for me.

As an end user, it can be very difficult to interpret errors and solve them.

IMO that's down to the fact that error detection is poor (leading to internal inconsistency errors rather than user errors), and the actual error and log messages that do exist aren't that great. Ultimately, the user should get a Python exception either way, because that's how a Python programmer would normally expect to get errors. Python isn't made to receive multiple errors at a time. Although maybe its warning system can be used, I've never used it myself and don't know how it works but I know it exists. Would probably be difficult/annoying with SWIG though.

Maybe we should just use stdout for things returned to the user, and stderr for internal trouble, but I don't know whether that works with Python.

Python has nothing to do with stderr/stdout, and in fact couldn't even capture OpenQL's stderr/stdout even if it wanted to, because it's not a subprocess but a library (unless it would use system calls to reopen the stdout/stderr streams to a pipe in POSIX or whatever Windows has for this on Windows for every OpenQL library call, and reopen back to the original streams when the call completes, because otherwise Python's print would also not work anymore). Also, mixing stdout and stderr (as is already done now actually) is a terrible idea because there is nothing in the OS that synchronizes these two streams; even if you flush after every line there is no guarantee that the messages actually appear in the sent order. 2>&1 does sort of do this because in the OS it makes both stderr and stdout references to the same pipe, but then you lose any benefit it might have had. Common practice is that all logging and other spurious information goes to stderr, while stdout (analogous to stdin) is reserved for explicitly requested program I/O.

I'm also not sure whether there is any synchronization guarantees between sys.out and stdout (or sys.err and stderr), because Python's file classes can do their own buffering on top of OS buffering.

In DQCsim I built a whole system for logging that allows Python to set a callback function for log messages, such that the messages can be piped to Pythonic logging modules without polluting stdout/stderr. But that logging system is half of DQCsim, in part because callbacks from C++ into Python are a royal pain because you have to hold a reference to the PyObject while the function pointer is live somewhere, so you need to make a class for every callback function, and so on.

the 'problem' with having the extra file is that it introduces an implicit extra dependency, and that it precludes having several configuration files in a single directory that use a different pass setup

Huh? The name for the pass configuration file is not fixed. Maybe you got confused because it's derived from the eqasm_compiler key of the platform JSON file when using the legacy API? That's just a compatibility layer for Python stuff written before pass management was a thing.

jvanstraten avatar Mar 30 '21 10:03 jvanstraten

Some additional ideas for the API, to keep things as close to the current situation as possible, and to resolve the remaining inconsistencies (I've already solved the group stuff with an explicit call that freezes the options and subsequently allows the sub-pass list (if any) to be modified):

  • Wrap the pass manager into ql.Platform, so users don't need to make a new structure. ql.Program and ql.Kernel are already constructed with these, so ql.Program.compile() can just use the compiler from the platform. Inside C++, the compiler and platform structures would still be distinct, however. It might be a good idea to make a ql2 module or something at some point, with a properly engineered API, and without backward having to wiggle around corners and through loopholes all the time to maintain backward compatibility.
  • The ql.Platform constructor receives an optional additional string argument that may be used to specify a compiler configuration filename. If used, the eqasm_compiler key in platform is ignored, and the compiler's configured architecture key is to be used instead wherever eqasm_compiler is still used.
  • When the second argument is not specified, (ab)use the now largely defunct eqasm_compiler key in the platform structure to configure the compiler:
    • if it's a string with one of the previously supported values, desugar those into standard compiler configuration files;
    • if it's an unrecognized string, treat it as a compiler configuration filename;
    • if it's an object, treat it as an inlined compiler configuration structure;
    • if it's not specified, the user must add passes programmatically after ql.Platform construction, or it will be empty.
  • Users have direct access to the compiler structure via ql.Platform, so they can mess around with the compilation strategy, pass options, etc.
  • As soon as a program or kernel is constructed using the platform, the compiler structure is completely frozen. This means that the strategy can contain passes that modify/initialize the platform structure based on target-specific stuff, without needing a special mechanism for this, thus resolving the first inconsistency listed in my first post.

jvanstraten avatar Apr 01 '21 08:04 jvanstraten

sounds great, makes the transition a lot smoother

wvlothuizen avatar Apr 01 '21 10:04 wvlothuizen