miss_hit icon indicating copy to clipboard operation
miss_hit copied to clipboard

support DSLs inside MATLAB

Open acristoffers opened this issue 3 years ago • 7 comments

What kind of feature is this?

  • Support a non-supported MATLAB or Octave construct

MISS_HIT component affected

  • Style checker

Describe the solution you'd like

The CVX project (http://cvxr.com/cvx) has its own "mini-language" inside Matlab. The style checker, when set to fix the file, breaks the indentation. In the example page, the code:

m = 20; n = 10; p = 4;
A = randn(m,n); b = randn(m,1);
C = randn(p,n); d = randn(p,1); e = rand;
cvx_begin
    variable x(n)
    minimize( norm( A * x - b, 2 ) )
    subject to
        C * x == d
        norm( x, Inf ) <= e
cvx_end

becomes

m = 20; n = 10; p = 4;
A = randn(m,n); b = randn(m,1);
C = randn(p,n); d = randn(p,1); e = rand;
cvx_begin
variable x(n)
minimize( norm( A * x - b, 2 ) )
subject to
C * x == d
norm( x, Inf ) <= e
cvx_end

The solution would be to treat cvx_begin, subject to and cvx_end the same way that if, else and end are treated, with the added quirk that the subject to block is actually one indentation level more, not less/the same like else. If the particular indentation of subject to is too complicated to implement, having it the same as else is already better than the current behaviour.

acristoffers avatar May 03 '21 12:05 acristoffers

Oh dear. OK, so I'll be really honest I do not see how I can reasonably do this.

Mainly because MISS_HIT is actually based on a full MATLAB lexer and parser. This means I do not just have a list of "if", "case", etc. and then indent. There is a full understanding of the semantics of the code. For example see how an if statement is parsed

  • https://github.com/florianschanda/miss_hit/blob/ad94e23b0a011eda1f468acb2976033cf549848c/miss_hit_core/m_parser.py#L1258-L1272
  • https://github.com/florianschanda/miss_hit/blob/ad94e23b0a011eda1f468acb2976033cf549848c/miss_hit_core/m_parser.py#L1908-L1942

In addition this is turned into an AST, see here again for the if example:

  • https://github.com/florianschanda/miss_hit/blob/ad94e23b0a011eda1f468acb2976033cf549848c/miss_hit_core/m_ast.py#L1946-L1968

So to support this I would need to fully understand and implement this mini-language.

It is not impossible, but it could be done as an extra language addition (e.g. we already have Octave as a language, and Simulink to some extent, we could add CVX too...)

But the effort for this for me would not be reasonable.

That said, I will keep this open. Maybe there is a way to do this, even if it's just a hack. But I will think about it because clearly the use-case is there. Perhaps a user-defined list of functions, that when called produce extra indent and extra outdent.

florianschanda avatar May 03 '21 12:05 florianschanda

@acristoffers again, I can't promise anything fast, but this problem intrigues me :)

I don't really have time to learn all about CVX, so I will need your help! I will need to see more examples, besides that one, especially real world ones if you have. If you could send me as much example code as you can that is indented in the way that you'd like in that format that would be really really helpful. Either

  • email me a bag of code and I'll promise to keep it secret; but I will use it to derive some public tests with anonymised names but obviously keeping some of the structure intact
  • make a PR with a new test directory in test/style/dsl/cvx with as much well-indented examples as you can find. Ideally there is less MATLAB/Octave code there and mostly just CVX stuff

From this I can try to reverse engineer some useful patterns and features. I think I will have a dsl { ... } section in the config file, where you can give special treatment to some identifiers.

florianschanda avatar May 04 '21 05:05 florianschanda

So far I can see these rules:

  • cvx_begin indent by 1, and no terminating ;
  • cvx_end kill all custom indent, and no terminating ;
  • variable special function with no terminating ;
  • minimize special function with no terminating ;
  • subject indent by 1, and no terminating ;

I note that there is no way to get out of the +1 indent from subject, is that really the case in CVX? Or is there something that closes the subject to thing?

florianschanda avatar May 04 '21 05:05 florianschanda

That is really the case. It is so because it mimics how you would write the minimization on paper, where people put the s.t. (subject to) below minimize (maximize/arg min/arg max) and the set of restrictions below the cost function (the norm in the example). It is like a table, but without borders.

Also, there is the maximize special function too.

acristoffers avatar May 04 '21 06:05 acristoffers

This is a list of all special cvx_* functions:

cvx_begin
cvx_clear
cvx_end
cvx_expert
cvx_pause
cvx_power_warning
cvx_precision
cvx_profile
cvx_quiet
cvx_save_prefs
cvx_solver
cvx_solver_settings
cvx_tic
cvx_toc
cvx_where

and this is a list of keywords inside a cvx_begin/end block:

In
binary
dual
epigraph
expression
expressions
hypograph
integer
maximise
maximize
minimise
minimize
subject
variable
variables

I'm not an expert in CVX either, but I've built an examples folder with indented code. The files are minimal, having only the CVX blocks. I've extracted the snippets from the examples folder, which are all real-world examples. From what I could see, only cvx_begin, cvx_end and subject have indentation implications, all other functions/keywords are used normally. The keywords I listed above usually have no semicollon at the end of the line, but it does not hurt if you put it either.

acristoffers avatar May 04 '21 08:05 acristoffers

I have merged PR #214, thank you for the examples.

Again, just to set expectations, please do not expect anything soon. The code that deals with indentation is somewhat complex and adding something like that will be hard, and it may turn out to be impractical after analysis. In addition, since this will be user-configurable and there is just no scheme that I could use in the configuration mechanism it will take a fair bit of design work to come up with something that can cope with at least the cvx mini language.

florianschanda avatar May 05 '21 11:05 florianschanda

I won't expect, don't worry. When you replied showing you have a full parser/lexer, I realized how hard it will be to implement the change. So I already hacked a small (and dirty) Python script to fix the indentation after I run mhstyle, so I get everything that miss_hit offers plus the correct indentation, so the problem is solved for me. Anyway, having it built-in could be nice, if not too hard. Thank you for really considering the issue, miss_hit is a great project.

acristoffers avatar May 05 '21 12:05 acristoffers