EdgeML icon indicating copy to clipboard operation
EdgeML copied to clipboard

Redundant fastTrainer?

Open adityakusupati opened this issue 4 years ago • 6 comments

fastTrainer. and fastcell_example of the harsha/reorg branch seem to be out of date and need to be updated or removed.

@harsha-simhadri I think thes file needs to be removed once your fastmodel is robust.

adityakusupati avatar Aug 21 '19 10:08 adityakusupati

@adityakusupati The first link looks broken

harsha-simhadri avatar Aug 21 '19 20:08 harsha-simhadri

It was because of the reorg - https://github.com/microsoft/EdgeML/blob/harsha/reorg/pytorch/edgeml_pytorch/trainer/fastTrainer.py

adityakusupati avatar Aug 22 '19 05:08 adityakusupati

It was because of the reorg - https://github.com/microsoft/EdgeML/blob/harsha/reorg/pytorch/edgeml_pytorch/trainer/fastTrainer.py

@adityakusupati Can you test the fastmodel in PR 123, and see if you are happy. If everything you need is there, I will remove the fastTrainer and fastcell_example files.

harsha-simhadri avatar Aug 22 '19 11:08 harsha-simhadri

@harsha-simhadri thanks for the response. I will do the needful and @SachinG007 will assist me. I think we need to change the names of the new files (both the trainer and example) for Fastcell just to be consistent.

adityakusupati avatar Aug 22 '19 11:08 adityakusupati

  • [ ] Arg parse should contain at least: dataDir, cellType, inputDims, hiddenDims, numEpochs, batchSize, learningRate, wRank, uRank, wSparsity, uSparsity, gate and update non-linearities, decayStep, decayRate and outputFile. Please follow one of thisor this for default values of the argparse.
  • [ ] Mean-var normalization
  • [ ] Create Model directory to save .npy models and .ckpt
  • [ ] Save Mean and Std in model directory
  • [ ] Save the command which has been executed in the model directory for future reference
  • [ ] Provide support for all the Custom RNN cells in rnn.py to be used as part of our trainer and example
  • [ ] Save params in .npy format for the best model in the trainer which is updated as the epochs go by.
  • [ ] Aggregate all the results in one txt file per cell type in the dataset directory. Use this as reference. This dump contains, model size, model accuracy and the absolute path to the model directory and is very useful for grid searches and budget tuning.
  • [ ] Complete dense training (for no sparsity)
  • [ ] IHT for the 2nd phase with interleaved fixed support training (generally 1 IHT step for 15 fixed support train batches)
  • [ ] 3rd phase with fixed support training
  • [ ] Print the final best model numbers and the current model number along with the model directory at the end of the training.
  • [ ] Please support my version of quantSigm along with others to ensure stable quantization.
  • [ ] Support batch_first=True and False for RNN unrolling.
  • [ ] Support tuple input to RNN in case the cell is LSTM
  • [ ] Make num_matrices a property so that people can access it outside for some other things.
  • [ ] Support multi-layer RNNcells for all the custom RNN cells in rnn.py
  • [ ] Support quantizeModels.py for the RNN Cell models.

I will add if I find any other thing, but this seems complete.

adityakusupati avatar Aug 26 '19 11:08 adityakusupati

Just saw this comment. Adding my suggestions:

  • [ ] Making all the examples consistent (as much as possible) in terms of usage. This means including a notebook as well as the command lines argument based script for FastRNN. The notebook is a good thing to have as a reference as github renders the code and samples outputs online. The command line script is always useful. If you really want to incorporate 'trainig_config', please do so by merging it with argparse. This can be done by setting the default parameter of argparse to pick values from train_config as below:
parser.add_argument('-c', '--cell', type=str, default=train_config.CellProperties.cellType,
                        help='Choose between [FastGRNN, FastRNN, UGRNN' +
                        ', GRU, LSTM], default: FastGRNN')
  • [ ] The edgeml.graph is supposed to only contain the forward computation graph. This means that the sparcification methods need to move out.

metastableB avatar Aug 27 '19 16:08 metastableB