dffml icon indicating copy to clipboard operation
dffml copied to clipboard

docs: contributing: maintainers: gsoc: How to apply as our suborg

Open pdxjohnny opened this issue 2 years ago • 2 comments

This issue is work in progress docs for how our suborg applies to PSF every year. This is NOT for students applying to this org, although you may find it interesting.

pdxjohnny avatar Mar 09 '22 14:03 pdxjohnny

  • Go to https://python-gsoc.org/deadlines.html to check deadlines
  • GSoC team wants to see a single ideas page, this page can reference other pages for the individual ideas
    • Copy last years GSoC pages into a new directory
    • https://github.com/intel/dffml/blob/master/docs/contributing/gsoc/2021/
  • Ensure we have at least 3 ideas
    • Things that project cannot be about per guidelines from Google/PSF
      • Purely docs
        • Writing examples which are used in docs is different, that's about the code, the output the GSoC project must be executable code
    • We can copy structure from previous years
      • https://github.com/intel/dffml/issues?q=is%3Aissue+is%3Aopen+label%3Agsoc
    • Must ensure we have all sections filled out with enough information for someone to perform exploratory work (write some code)
    • Now that there are hour allotments for projects (175/350), this estimated time needs to be stated
    • Create static markdown / rST files within the new year's directory exported from issues
      • $ gh issue view https://github.com/intel/dffml/issues/919 --json body | jq -r '.body' | tr -d '\r' | sed -e 's/[[:space:]]*$//' -e 's/^#/\n\n#/g'
    • Must have at least one Beginner level project
  • Short description of the project
    • DFFML aims to democratize machine learning. It makes feature engineering, dataset storage, model training, and model deployment simple and easy. Its wide selection of model and dataset storage plugins enable developers of all backgrounds to play with and add machine learning to their projects.
  • Previous feedback received
    • warthog9: "Minor nit on your ideas page, having the students jump to a sub-page to get the gist might dissuade some of them from looking. I tend to recommend more info on one page and making it as easy to read as possible. If you want to keep the extra pages (they are quite detailed, I wouldn't blame you), maybe some quick summary data for the items? [ name | level | summary ] might encourage some more click throughs at least"
  • Use the below logo for upload to https://blogs.python-gsoc.org/en/suborg/application/new/
dffml-loops

pdxjohnny avatar Mar 09 '22 14:03 pdxjohnny

Idea template:

## Project Description

AutoML or Automated Machine Learning as the name suggests automates the process
of solving problems with Machine Learning. AutoML is generally helpful for
people who aren't either familiar with Machine Learning or the involved
programming. AutoML aims to improve the efficiency of any task involving
Machine Learning.

The primary objective we are trying to achieve is to create a model that
takes as a property of its config a set of models to used for hyperparameter
tuning. Another property of its config is the set of models which we should
attempt to tune (via the first set). Default values for these results in using
all installed models to try to tune all installed model plugins.

- To start, we should define a reduced set of models (not all the ones we have).
  We'll implement AutoML supporting only this reduced set. The first phase of
  this project will be to make sure that one model can be used to tune
  hyperparameters of another model.

- The next phase will be to tune two models using the same tuning model. This
  followed by tuning two models, using two models which amounts to doing the
  previous task twice, with a different tuning model the second time.

- The following phase will be to go through each model in each model plugin we
  have and see which ones have issues being tuned using the approach taken in the
  previous phase. This phase will help us determine which properties or methods
  we may need to add to models to help them self identify and thereby indicate
  their requirements for hyperparameter tuning, or maybe their inherent lack of
  support for it.

- The final phase will be to implement hyperparameter tuning for N by N models,
  after implementing what we found to be gaps in the previous phase.<br>

Due to the shortened GSoC cycle, we may end up not doing all of these phases.
Which one we go to will be decided as we approach the selection process.

## Skills

- Python
- Intermediate Machine Learning
- Experience with various machine learning frameworks (AutoML frameworks would
  be a plus)

## Difficulty

Intermediate/Hard

## Related Readings

- https://github.com/intel/dffml/blob/master/docs/contributing/gsoc/2021.md
- https://scikit-learn.org/stable/model_selection.html#model-selection
- https://www.automl.org/automl/

## Getting Started

- Read the contributing guidelines
  - https://intel.github.io/dffml/master/contributing/index.html
- Go through the quickstart
  - https://intel.github.io/dffml/master/quickstart/model.html
- Go trough the model tutorials
  - https://intel.github.io/dffml/master/tutorials/models/
- Go through the model plugins
  - https://intel.github.io/dffml/master/plugins/dffml_model.html
  - You don't need to go through all of them. Just get a feel for running a few

## Potential Mentors

- [John Andersen](https://github.com/pdxjohnny)
- [Yash Lamba](https://github.com/yashlamba)
- [Saksham Arora](https://github.com/sakshamarora1)

pdxjohnny avatar Mar 09 '22 15:03 pdxjohnny