boda icon indicating copy to clipboard operation
boda copied to clipboard

Integrating add_codegen_annotations to conv_pipe_fwd

Open dinhv opened this issue 7 years ago • 16 comments

When running a CNN with "run_cnet" mode, Boda generates each convolution function of the CNN one by one and OpenCL compiles and run them one by one. The generated functions are saved in fwd_calls in struct conv_pipe_fwd_t. So my first attempt to integrate auto-tuning is to add the function add_codegen_annotations to the functions gen_apply_func_to_var and gen_call in rtc_fwd.cc since that's where the convolution_fwd pipe calls the code generation. I added a NESI argument autotune, so when autotune is true it tries to annotate the code.

string conv_pipe_fwd_t::gen_apply_func_to_var( string const & in_an, string const & in_var, string const & ret_an, dims_t const & ret_dims,  string const & func_name, p_conv_op_t const & oi ){
  if(autotune) {
      long a;
      long b;
      get_cl_device_info(&a, &b);

     op_tune_t op_tune;
     op_tune.use_be = "ocl";
     op_tune.MNb.d[0] = 8;
     op_tune.MNb.d[1] = 16;
     op_tune.MNt.d[0] = 8;
     op_tune.MNt.d[1] = 8;
     op_tune.Kb = 8;
     op_tune.use_local_mem = 1;
     op_tune.prof_variant = 0;
     op_tune.vw = 8;
     op_tune.k1conv = 0;
     op_tune.tconv = 0;
     op_tune.tconv_max_ksz.d[0] = 11;
     op_tune.tconv_max_ksz.d[1] = 11;
     op_tune.ipconv = 0;

     try { add_codegen_annotations(oi, op_tune, 0); }
     catch (unsup_exception const &us_exp) {
       printf("annotation failure\n");
     }
}

p_rcg_func_call_t rfc = codegen.gen_func_override_func_name( func_name, *oi, map_str_rtc_arg_t() );
string const ret_var = in_var + "__" + rfc->rcg->gen_fn;
bool const did_ins = inxp_names.insert( ret_var ).second;
if( did_ins ) { // newly-seen/used ret_var, so create and calc it here
  must_insert( rfc->arg_map, in_an,  in_var  );
  must_insert( rfc->arg_map, ret_an, ret_var );
  add_fwd_call( rfc, in_var + "__inxp" );
  rtc->create_var_with_dims( ret_var, ret_dims );
}
return ret_var;

}

When calling Boda with run_cnet --model-name=nin_imagenet --in_dims="(img=1)" --conv_fwd="(mode=rtc,write_op_sigs=1)" autotune=1 it leads to an assertion failure. The error occurs in cnn_op.cc in line op->set_dims("in_ref",in_dims); // tconv needs the standard input dims for reference. What's the reason for this failure? Isn't it possible to call add_codegen_annotations with the given parameter p_conv_op_t const & oi or is op_tune not correct? Or because p_conv_op_t const & oi doesn't have the same parameters as an given operation e.g. conv-ops-debug.txt?

dinhv avatar Jun 12 '17 08:06 dinhv

remember that add_codegen_annotations() is just a slightly generalized version of add_cnn_codegen_annotations(), which is already called for all the operations in the input graph in rtc_fwd.cc here:

https://github.com/moskewcz/boda/blob/master/src/rtc_fwd.cc#L479

so, it doesn't make sense to be trying to call it again later at some point in the flow. the key issue is that the op_tune that it is currently called with is fixed for all operations, rather than being selected per-operation (i.e. by some auto-tuning process). and of course, as discussed in #22, there's the deeper/more-important issue that the existing tuning/codegen process was designed so that a single op_tune could be sufficient, so it's not well suited to the case where the ability to try multiple points is guaranteed.

in terms of the particular error, the function gen_apply_func_to_var() is used in particular cases, not in general. as we can see by the comment above it and inspecting the calls to it:

// decl: https://github.com/moskewcz/boda/blob/master/src/rtc_fwd.cc#L226

// the (only) three call, all near each other, all for generating xpose operations: https://github.com/moskewcz/boda/blob/master/src/rtc_fwd.cc#L311 https://github.com/moskewcz/boda/blob/master/src/rtc_fwd.cc#L318 https://github.com/moskewcz/boda/blob/master/src/rtc_fwd.cc#L323

this function is used to unique the application of stateless unary functions, in particular function that transform data into different layouts. these 'xpose' functions are simple and don't require tuning/annotation, and thus don't go though the add_cnn_codegen_annotation() function.

in general, such xpose operations are 'sub functions' of their parent conv operation. so, codegen is directly called on them, passing the (already annotated) parent conv operation for reference, so that the sizes/types of the relevant input/output variables to transform can be determined. in gen_apply_func_to_var(), note the use of gen_func_override_func_name() so that, when calling codegen on the op_info, the xpose operation is generated instead of the parent conv operation:

https://github.com/moskewcz/boda/blob/master/src/rtc_func_gen.cc#L624

so, in short, the error you're seeing is due to trying to call add_cnn_codegen_annotations() on an already-annotated operation (since the parent conv was annotated in init() as per my first link above). if you dump the op_info, you should be able to clearly see if it's in a pre- or post- annotation state. you didn't give the exact error, but i guess it's as simple as that the "in_ref" dims are already set, and you can't set them again (as an error check).

moskewcz avatar Jun 12 '17 15:06 moskewcz

Thank you for your quick response. After reading it, my current idea looks like this by adapting the code in https://github.com/moskewcz/boda/blob/master/src/rtc_fwd.cc#L479 :

for( map_str_p_conv_op_t::iterator i = cp->convs->begin(); i != cp->convs->end(); ++i ) { 
      p_conv_op_t const & oi = must_find( *op_infos, i->first );
      op_tune_t best_opt = op_tune;

      if(autotune) {
         list<op_tune_t> op_tune_list = getting_some_op_tunes_to_try_out();
         for(opt in op_tune_list) {
             add_cnn_codegen_annotations( oi.get(), opt, 0 );
             float runtime = generate_and_run_annotated_function();
             if(best_time(runtime)) {
                best_opt = opt;
             }
         }
      }

      add_cnn_codegen_annotations( oi.get(), best_opt, 0 );
    }
}

Regardless whether this approach isn't efficient or takes too much time, is it feasible? This approach would require to annotate oi.get() multiple times which leads to the failure you mentioned above. Is there a way to bypass or solve this issue?

dinhv avatar Jun 12 '17 17:06 dinhv

in short, yes, that's the general idea. but, what you've sketched is (in some ways) 'the easy part' -- the harder part is making sure, in general, that proper transformations are inserted between operations if needed, depending on the variants that are selected and their IO formats. there are only a few cases to handle currently, and you can see how they are handled manually in the existing rtc_fwd flow.

at a high level, though, remember that (for this particular issue/task) you're trying to integrate/unify the flows in rtc_fwd.cc and the existing 'autotuner' in ops_prof. so all the code in the ops_prof flow is relevant, and there's quite a bit more than in your above sketch. pretty much all of it is needed. among other things, it certainly profiles multiple implementations of the same operation by calling add_codegen_annotations multiple times, so looking at the flow in the tuner near the relevant call should be illustrative for that part of the issue:

https://github.com/moskewcz/boda/blob/master/src/rtc_prof.cc#L287

mwm

moskewcz avatar Jun 12 '17 19:06 moskewcz

To keep the integrated tuning with rtc_prof easy, is it recommendable to call ops_prof just with the following parameters: ops-prof --kg-tune-tag=ocl-def --func-mrd-toler="(cudnn_conv=4e-4)" --ops-fn="%(boda_test_dir)/sgemm-ops-debug.txt" --gen-data="(str_vals=(type=gen_data),nda_vals=(vi=(tn=float,v=0.0),mode=(tn=uint32_t,v=5)))" --op-tunes="(ocl-def=(use_be=ocl,),ocl-4-16-4-lm0=(use_be=ocl,MNt=4:4,MNb=16:16,Kb=4,use_local_mem=0))"

dinhv avatar Jun 15 '17 11:06 dinhv

i'm not sure what you're asking or where you came up with that command line; can you explain your idea/reasoning?

moskewcz avatar Jun 15 '17 14:06 moskewcz

I would like to to call the ops_prof_t::main function from the conv_pipe_fwd_t::init function for profiling a convolution function. I used to call the ops-prof with following parameters for profiling:

ops-prof
--out-fn="%(boda_output_dir)/cnn_op_info.txt"
--kg-tune-tag=ocl-def
--func-mrd-toler="(cudnn_conv=4e-4)"
--wisdom-out-fn="%(boda_output_dir)/wisdom.wis"
--ops-fn="%(boda_test_dir)/sgemm-ops-debug.txt"
--gen-data="(str_vals=(type=gen_data),nda_vals=(vi=(tn=float,v=0.0),mode=(tn=uint32_t,v=5)))"
--wisdom-in-fn="%(boda_test_dir)/good_tr/sgemm-gen5/wisdom.wis"
--op-tunes="(ocl-def=(use_be=ocl,),ocl-4-16-4-lm0=(use_be=ocl,MNt=4:4,MNb=16:16,Kb=4,use_local_mem=0),ocl-4-16-4-lm2-vw4=(use_be=ocl,MNt=4:4,MNb=16:16,Kb=4,use_local_mem=2,vw=4), ocl-4-16-4-lm3-vw4=(use_be=ocl,MNt=4:4,MNb=16:16,Kb=4,use_local_mem=3,vw=4))"

Now to keep things easy, I left out some parameters if calling from conv_pipe_fwd_t::init resulting in the command I posted above simply because I don't have the parameters e.g. wisdom-in, wisdom-out, out-fn. But if I leave these parameters out, does the profiling still working properly? Furthermore if integrating the ops-prof flow to conv_pipe_fwd_t::init, there is no parameter ops-fn available but without it, how should ops-prof know what operation and its parameters it should profile?

dinhv avatar Jun 15 '17 15:06 dinhv

okay, i think i understand your question somewhat.

  1. i can't think of a fundamental reason that input/output wisdom files would be needed for profiling. but, i'd also not be surprised if the code might need some modifications to work well in this somewhat-different-usage-mode. in particular, it may be that the existing numerical correctness checking that ops_prof currently performs may not be too sensible in a tuning context (although maybe the best-selected implementation for any operation should still be correctness-checked against some default/fallback).

  2. for the overall task of integrating the autotuning with the full-net-running, there's no reason to assume that the existing architecture and/or interfaces are well suited for what you want to do, so various refactoring and experimentation may be needed. and, going deeper, once an initial integration is done, the various interfaces can/should/may be altered to achieve a better result.

for example, while it's perfectly sensible for ops_prof to read operations from a file, that makes less sense as an interface for online or cached autotuning as used by the full-net-running flow. so, one might naturally expect that ops_prof might need to be refactored into multiple modules, where one module presents the existing UI (which includes reading/writing wisdom and ops files), and another uses some more programmatic interface for the profiling itself (to some degree, such internal interfaces may already be present). then, other code could use lower-level interfaces as appropriate. however, as i've noted, i'd think that reading/writing wisdom files is probably something that's a good idea (or even required) for practically usable flows.

in the long run, i expect that there needs to be some programmatic back-and-forth interface between the code generation and the tuner as well, but that would all happen on the tuner-side of the tuner/full-net-runner interface.

moskewcz avatar Jun 15 '17 15:06 moskewcz

That's what I was asking for. Thank you!

dinhv avatar Jun 15 '17 15:06 dinhv

did you delete/figure-out the answer to your prior comment/question about nia? i got an email about it, but i don't see a comment now.

On Thu, Jun 15, 2017 at 10:52 AM, dinhv [email protected] wrote:

One more question regarding to this codeline: https://github.com/moskewcz/boda/blob/master/src/rtc_prof.cc#L212

In this line you create a variable ops_be. The third argument nia is the value for ops_be.codegen according to the struct https://github.com/moskewcz/boda/blob/master/src/rtc_prof.cc#L130 . But then in https://github.com/moskewcz/boda/blob/master/src/rtc_prof. cc#L223 you overwrite nia. So why do you first initialize it with nia? More important, is it also possible to pass nothing so I don't need the nesi_init_arg_t

  • nia for back end initialization in rtc_prof? That would make the integration to rtc_fwd easier.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/moskewcz/boda/issues/23#issuecomment-308819001, or mute the thread https://github.com/notifications/unsubscribe-auth/AGL-szC0H459WSomgfr4sId8PorPbKqgks5sEW9CgaJpZM4N2yRP .

moskewcz avatar Jun 15 '17 18:06 moskewcz

Yes, I did because I could answer it by myself. I simply looked at the wrong code line but now I have another question, also at the same code line.

void ops_prof_t::main( nesi_init_arg_t * nia ) requires the argument nia for https://github.com/moskewcz/boda/blob/master/src/rtc_prof.cc#L212 . So somehow rtc_fwd.cc has to pass this argument to ops_prof. By looking at https://github.com/moskewcz/boda/blob/master/src/rtc_fwd.cc#L517 I assume I can use the same nia for both modes?

dinhv avatar Jun 15 '17 20:06 dinhv

In short, yes. I think passing a null Nia (or empty if null is not allowed) is generally okay too. The purpose of the Nia arg is to pass down options from the parent environment, which is generally optional.

On Jun 15, 2017 1:26 PM, "dinhv" [email protected] wrote:

Yes, I did because I could answer it by myself. I simply looked at the wrong code line but now I have another question, also at the same code line.

void ops_prof_t::main( nesi_init_arg_t * nia ) requires the argument nia for https://github.com/moskewcz/boda/blob/master/src/rtc_prof.cc#L212 . So somehow rtc_fwd.cc has to pass this argument to ops_prof. By looking at https://github.com/moskewcz/boda/blob/master/src/rtc_fwd.cc#L517 I assume I can use the same nia for both modes?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/moskewcz/boda/issues/23#issuecomment-308856330, or mute the thread https://github.com/notifications/unsubscribe-auth/AGL-sxt8cO1sQuKkgnOfBAMBiXTl7hVjks5sEZN8gaJpZM4N2yRP .

moskewcz avatar Jun 15 '17 20:06 moskewcz

to add a bit more detail: the 'nia' that is passed into main contains information about options passed on the command line. so, it's not as if it's 'required' in a black-box sense, but it's needed exactly if you need to pass some NESI-specified options down to that function.

however, it may or may not actually make sense to call the main() function of some NESI class from another one. in the case where it does make sense to call it, it might or might not make sense to pass down the top-level nia. you might instead make a new nia and populate it with some specific data, or you might pass null, and so on.

main() functions in NESI classes are usually focused on implementing some CLI-level UI for some functionality, rather than being something callable by other code. so, i might expect that, in general, it makes more sense refactor things so that both ops_prof::main() and whatever ever other code wants to do profiling call some core functions that are factored out of the existing ops_prof flow.

to illustrate, one method to 'integrate' ops_prof into the rtc_fwd flow would be to construct a command line and use system() to run it, reading and writing data from files. however, this is probably neither necessary nor sensible, since you can just call main(). but the same logic, calling main() may not be necessary nor sensible, since you can directly call the underlying profiling function, after some refactoring and adding the proper glue/wrappers.

anyway, in general, you'll just need to investigate gain understanding and experiment to explore various possible approaches.

moskewcz avatar Jun 15 '17 22:06 moskewcz

so, in short, the error you're seeing is due to trying to call add_cnn_codegen_annotations() on an already-annotated operation (since the parent conv was annotated in init() as per my first link above). if you dump the op_info, you should be able to clearly see if it's in a pre- or post- annotation state. you didn't give the exact error, but i guess it's as simple as that the "in_ref" dims are already set, and you can't set them again (as an error check).

I still get this error, even if I put the call to the rtc_prof mode before https://github.com/moskewcz/boda/blob/master/src/rtc_fwd.cc#L479 . In my solution I copied the add_cnn_codegen_annotations function to my new class auto_tuner and refeactored the code a little bit. I simply put a condition around statements which set dimension or function names e.g.

if(!op->has("in_ref")) {op->set_dims("in_ref", in_dims); // tconv needs the standard input dims for reference}

Could this solution lead to false profiling results? Should I rather use op->reset_dims("in_ref", in_dims) or figure out another solution?

dinhv avatar Jun 19 '17 01:06 dinhv

Well, without looking at your code, it's hard to say what might be going on exactly. But I think I stand by my earlier statement: the assertion is an error check that indicates you're calling add_cnn_codegen_annotations() twice on the same op, which you should never do. Disabling the assertion without understanding why that might be sensible certainly doesn't​ seem like a good idea to me. As I said, dumping the op in the various cases (i.e. on each call) should be illustrative, and it's not clear if you tried that or what you learned if you did.

On Jun 18, 2017 6:30 PM, "dinhv" [email protected] wrote:

so, in short, the error you're seeing is due to trying to call add_cnn_codegen_annotations() on an already-annotated operation (since the parent conv was annotated in init() as per my first link above). if you dump the op_info, you should be able to clearly see if it's in a pre- or post- annotation state. you didn't give the exact error, but i guess it's as simple as that the "in_ref" dims are already set, and you can't set them again (as an error check).

I still get this error, even if I put the call to the rtc_prof mode before https://github.com/moskewcz/boda/blob/master/src/rtc_fwd.cc#L479 . In my solution I copied the add_cnn_codegen_annotations function to my new class auto_tuner and refeactored the code a little bit. I simply put a condition around statements which set dimension or function names e.g.

if(!op->has("in_ref")) {op->set_dims("in_ref", in_dims); // tconv needs the standard input dims for reference}

Could this solution lead to false profiling results? Should I rather use op->reset_dims("in_ref", in_dims) or figure out another solution?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/moskewcz/boda/issues/23#issuecomment-309318020, or mute the thread https://github.com/notifications/unsubscribe-auth/AGL-s4toN3_VcnMqUUEIkEnMPzPOPZ_Sks5sFc82gaJpZM4N2yRP .

moskewcz avatar Jun 19 '17 01:06 moskewcz

So this is the code snippet:

for( map_str_p_conv_op_t::iterator i = cp->convs->begin(); i != cp->convs->end(); ++i ) { 
      p_conv_op_t const & oi = must_find( *op_infos, i->first );

      //integration of profiling and auto-tuning
      if(autotune) {
        long a, b;
        p_conv_op_t op_copy = std::make_shared<conv_op_t>(*oi);

        if( op_copy->is( Convolution_coi ) ) {
          p_conv_node_t no = cp->must_get_node( op_copy->get_arg("out") ); // aka oi->coi->top_an(0) ...
          bool const conv_has_relu = (no->in_place_ops.size() > 0) && (no->in_place_ops[0]->is(ReLU_coi));
          // mark relu as fused-away; mark conv as having fused-on relu // NOTE/FIXME(?): relu may be not-init()-yet here ...
          if( conv_has_relu ) {
            must_find( *op_infos, no->in_place_ops[0]->tag )->set_u32( "fused", 1 );
          }
          op_copy->set_u32( "conv_has_relu", conv_has_relu );
        }

        get_cl_device_info(&a, &b);
        auto_tuner_t auto_tuner;
        auto_tuner.init();
        auto_tuner.auto_tuning(nia, op_copy);
      }

      add_cnn_codegen_annotations( oi.get(), op_tune, 0 );
    }

The integration of gen_ops_rec is still missing and leads to another error, but for now I'm interested how to solve the issue of multiple codegen_annotations on the same operation. I first thought one can solve this problem by making a copy of the original un-annotated p_conv_op_t.

dinhv avatar Jun 19 '17 09:06 dinhv

I found the bug. auto_tuner.auto_tuning(nia, op_copy); calls add_cnn_codegen_annotations twice on op_copy, one for the known good and one for the tuning parameters I've set. Solved it by making a second copy of op_copy in the auto_tuning function.

dinhv avatar Jun 19 '17 12:06 dinhv