miss_hit
miss_hit copied to clipboard
support DSLs inside MATLAB
What kind of feature is this?
- Support a non-supported MATLAB or Octave construct
MISS_HIT component affected
- Style checker
Describe the solution you'd like
The CVX project (http://cvxr.com/cvx) has its own "mini-language" inside Matlab. The style checker, when set to fix the file, breaks the indentation. In the example page, the code:
m = 20; n = 10; p = 4;
A = randn(m,n); b = randn(m,1);
C = randn(p,n); d = randn(p,1); e = rand;
cvx_begin
variable x(n)
minimize( norm( A * x - b, 2 ) )
subject to
C * x == d
norm( x, Inf ) <= e
cvx_end
becomes
m = 20; n = 10; p = 4;
A = randn(m,n); b = randn(m,1);
C = randn(p,n); d = randn(p,1); e = rand;
cvx_begin
variable x(n)
minimize( norm( A * x - b, 2 ) )
subject to
C * x == d
norm( x, Inf ) <= e
cvx_end
The solution would be to treat cvx_begin
, subject to
and cvx_end
the same way that if
, else
and end
are treated, with the added quirk that the subject to
block is actually one indentation level more, not less/the same like else
. If the particular indentation of subject to
is too complicated to implement, having it the same as else
is already better than the current behaviour.
Oh dear. OK, so I'll be really honest I do not see how I can reasonably do this.
Mainly because MISS_HIT is actually based on a full MATLAB lexer and parser. This means I do not just have a list of "if", "case", etc. and then indent. There is a full understanding of the semantics of the code. For example see how an if statement is parsed
- https://github.com/florianschanda/miss_hit/blob/ad94e23b0a011eda1f468acb2976033cf549848c/miss_hit_core/m_parser.py#L1258-L1272
- https://github.com/florianschanda/miss_hit/blob/ad94e23b0a011eda1f468acb2976033cf549848c/miss_hit_core/m_parser.py#L1908-L1942
In addition this is turned into an AST, see here again for the if example:
- https://github.com/florianschanda/miss_hit/blob/ad94e23b0a011eda1f468acb2976033cf549848c/miss_hit_core/m_ast.py#L1946-L1968
So to support this I would need to fully understand and implement this mini-language.
It is not impossible, but it could be done as an extra language addition (e.g. we already have Octave as a language, and Simulink to some extent, we could add CVX too...)
But the effort for this for me would not be reasonable.
That said, I will keep this open. Maybe there is a way to do this, even if it's just a hack. But I will think about it because clearly the use-case is there. Perhaps a user-defined list of functions, that when called produce extra indent and extra outdent.
@acristoffers again, I can't promise anything fast, but this problem intrigues me :)
I don't really have time to learn all about CVX, so I will need your help! I will need to see more examples, besides that one, especially real world ones if you have. If you could send me as much example code as you can that is indented in the way that you'd like in that format that would be really really helpful. Either
- email me a bag of code and I'll promise to keep it secret; but I will use it to derive some public tests with anonymised names but obviously keeping some of the structure intact
- make a PR with a new test directory in
test/style/dsl/cvx
with as much well-indented examples as you can find. Ideally there is less MATLAB/Octave code there and mostly just CVX stuff
From this I can try to reverse engineer some useful patterns and features. I think I will have a dsl { ... }
section in the config file, where you can give special treatment to some identifiers.
So far I can see these rules:
-
cvx_begin
indent by 1, and no terminating;
-
cvx_end
kill all custom indent, and no terminating;
-
variable
special function with no terminating;
-
minimize
special function with no terminating;
-
subject
indent by 1, and no terminating;
I note that there is no way to get out of the +1 indent from subject
, is that really the case in CVX? Or is there something that closes the subject to
thing?
That is really the case. It is so because it mimics how you would write the minimization on paper, where people put the s.t. (subject to) below minimize (maximize/arg min/arg max) and the set of restrictions below the cost function (the norm in the example). It is like a table, but without borders.
Also, there is the maximize special function too.
This is a list of all special cvx_* functions:
cvx_begin
cvx_clear
cvx_end
cvx_expert
cvx_pause
cvx_power_warning
cvx_precision
cvx_profile
cvx_quiet
cvx_save_prefs
cvx_solver
cvx_solver_settings
cvx_tic
cvx_toc
cvx_where
and this is a list of keywords inside a cvx_begin/end block:
In
binary
dual
epigraph
expression
expressions
hypograph
integer
maximise
maximize
minimise
minimize
subject
variable
variables
I'm not an expert in CVX either, but I've built an examples folder with indented code. The files are minimal, having only the CVX blocks. I've extracted the snippets from the examples folder, which are all real-world examples. From what I could see, only cvx_begin
, cvx_end
and subject
have indentation implications, all other functions/keywords are used normally. The keywords I listed above usually have no semicollon at the end of the line, but it does not hurt if you put it either.
I have merged PR #214, thank you for the examples.
Again, just to set expectations, please do not expect anything soon. The code that deals with indentation is somewhat complex and adding something like that will be hard, and it may turn out to be impractical after analysis. In addition, since this will be user-configurable and there is just no scheme that I could use in the configuration mechanism it will take a fair bit of design work to come up with something that can cope with at least the cvx mini language.
I won't expect, don't worry. When you replied showing you have a full parser/lexer, I realized how hard it will be to implement the change. So I already hacked a small (and dirty) Python script to fix the indentation after I run mhstyle, so I get everything that miss_hit offers plus the correct indentation, so the problem is solved for me. Anyway, having it built-in could be nice, if not too hard. Thank you for really considering the issue, miss_hit is a great project.