ck icon indicating copy to clipboard operation
ck copied to clipboard

Idea: new PyPi classifiers and packaging everything up with `pip` as standard way of sharing architectures

Open SamuelMarks opened this issue 2 years ago • 5 comments

What is your opinion on this, that I originally posted almost 3 years ago? https://github.com/keras-team/keras/issues/15762

There are a huge number of new statistical, machine-learning and artificial intelligence solutions being released every month.

Most are open-source and written in a popular Python framework like TensorFlow, JAX, or PyTorch.

In order to 'guarantee' you are using the best [for given metric(s)] solution for your dataset, some way of automatically adding these new statistical, machine-learning and artificial intelligence solutions to your automated pipeline needs to be created.

(additionally: useful for testing your new optimiser, loss function, &etc. across a zoo of datasets)

Ditto for transfer learning models. A related problem is automatically putting ensemble networks together. Something like:

import some_broke_arch  # pip install some_broke_arch
import other_neat_arch  # pip install other_neat_arch
import horrible_v_arch  # builtin to keras

model   = some_broke_arch.get_arch(   **standard_arch_params  )
metrics = other_neat_arch.get_metrics(**standard_metric_params)
loss    = horrible_v_arch.get_loss(   **standard_loss_params  )

model.compile(loss=loss, optimizer=keras.optimizers.RMSprop, metrics=metrics)
print(model.summary())
# &etc.

In summary, I am petitioning for standard ways of:

0. exposing algorithms for consumption;

1. combining algorithms;

2. comparing algorithms.

To that end, I would recommend encouraging the PyPi folk to add a few new classifiers, and a bunch of us trawl through GitHub every month sending PRs to random repositories—associated with academic papers—linking up with CI/CD so that they are now installable with pip install and searchable by classifier on PyPi.

Related, my open-source multi-ML meta-framework:

  • uses builtin ast and inspect modules to traverse the module, class, and function hierarchy for 10 popular open-source ML/AI frameworks;

  • will enable experimentation with entire 'search-space' of all these ML frameworks (every transfer learning model, optimiser, loss function, &etc.)

[…]with a standard way of sharing architectures will be able to expand the 'search-space' with community contributed solutions.

Related:

IMHO there are a number of advantages to using existing approaches to finding and installing components of machine-learning models (and ensemble-able models).

Would appreciate your perspective (@bhack referenced your project)

SamuelMarks avatar Apr 18 '22 13:04 SamuelMarks

Thank you @SamuelMarks for your idea. It aligns with what we would like to achieve with CM (CK2).

arjunsuresh avatar Apr 19 '22 08:04 arjunsuresh

Hi @SamuelMarks. Thank you for your notes - very interesting and indeed related to our project as mentioned by @arjunsuresh ! We plan to have a prototype of a portable ML pipeline using our new CK2 (CM) framework within a few weeks. Will you be interested to check it out and discuss your ideas at some point? We will be glad to get your feedback! Thanks!

gfursin avatar Apr 20 '22 09:04 gfursin

Great to hear.

Sure thing, just @ tag me when ready.

PS: At some point I'll finish my own multi-ML meta-framework also (been building it with the aforementioned ast module in cdd-python) which should also benefit greatly from a deployment of this [meta] architecture. When ready I'll probably CC0 it.

SamuelMarks avatar Apr 21 '22 02:04 SamuelMarks

Hi again @SamuelMarks . We have released the next generation of the CK framework (CM) and we are now creating a new open workgroup in MLCommons to simplify MLPerf inference benchmark and make it easier to plug in any real world model, data set, framework, compiler and hardware. Please feel free to join us at https://github.com/mlcommons/ck/blob/master/docs/mlperf-education-workgroup.md - I think you experience is very relevant and your feedback will be very appreciated! Thanks!

gfursin avatar Sep 19 '22 15:09 gfursin

@gfursin Great, I replied to another thing you tagged me in. I'll try and make one of your meetings to discuss further. My Python compiler library—that I'm using to generate my multi-ML meta-framework and contribute strong types to major frameworks including TensorFlow—is about to gain some new features and fixes of old whitespace-related bugs. Watch this space!

In terms of the subject of this thread, what do you think about the PyPi centric solution? - Should we start a mailing-list thread or something with them? - Petition Google to ask them for the new classifiers?

I think my multi-ML meta-framework needs to finish its Proof-of-Concept phase before proceeding. Unless you have other ideas?

SamuelMarks avatar Oct 15 '22 23:10 SamuelMarks