cerebros-core-algorithm-alpha
cerebros-core-algorithm-alpha copied to clipboard
120 add use examples efficientnet fine tuning on cifar 100
Added EfficientNet (v.2, small model) fine-tuning on Cifar-100 using Cerebros / ipynb notebook / py code
I added a CICD test for this benchmark. Let's pray that this will run on the Github test server in a workable time. If not, we may need to make a miniaturized version of it for the CICD demos. https://github.com/david-thrower/cerebros-core-algorithm-alpha/pull/123/files#diff-cc8c65daed8907e6bb50ac1769d49c05f5f48bdbe8b5cfd3b24b7c5e56ceb8dc
Thanks! Should we try to update / add other CNNs? By the way in one EfficientNet Cifar-10 notebook I can see an orphaned computation (interrupted, never finished and never made it to testing).
Alex
On Tue, 31 Oct 2023 at 23:10, David Thrower @.***> wrote:
I added a CICD test for this benchmark. Let's pray that this will run on the Github test server in a workable time. If not, we may need to make a miniaturized version of it for the CICD demos. https://github.com/david-thrower/cerebros-core-algorithm-alpha/pull/123/files#diff-cc8c65daed8907e6bb50ac1769d49c05f5f48bdbe8b5cfd3b24b7c5e56ceb8dc
— Reply to this email directly, view it on GitHub https://github.com/david-thrower/cerebros-core-algorithm-alpha/pull/123#issuecomment-1788105772, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIOQZLAUNYRVJJTIAJKW2TYCFZLTAVCNFSM6AAAAAA6YLAT6CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBYGEYDKNZXGI . You are receiving this because you authored the thread.Message ID: @.*** com>
- Needs tensorflow_datasets be installed, otherwise the check fails. My bad! Updated requirements.txt, rerunning.
- Also complains no CUDA. Please instruct the course of action.
On Tue, 31 Oct 2023 at 23:10, David Thrower @.***> wrote:
I added a CICD test for this benchmark. Let's pray that this will run on the Github test server in a workable time. If not, we may need to make a miniaturized version of it for the CICD demos. https://github.com/david-thrower/cerebros-core-algorithm-alpha/pull/123/files#diff-cc8c65daed8907e6bb50ac1769d49c05f5f48bdbe8b5cfd3b24b7c5e56ceb8dc
— Reply to this email directly, view it on GitHub https://github.com/david-thrower/cerebros-core-algorithm-alpha/pull/123#issuecomment-1788105772, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIOQZLAUNYRVJJTIAJKW2TYCFZLTAVCNFSM6AAAAAA6YLAT6CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBYGEYDKNZXGI . You are receiving this because you authored the thread.Message ID: @.*** com>
Here is what I did: I added the tensorflow datasets to a separate requirements file, which I should also do later on with tensorflow-text and other ancillary requirements ... I want to avoid bloating the core and separating the use case specific packages from the core packages.
Thanks! I'm gonna follow this structure in all future pull requests.
On Wed, 1 Nov 2023 at 00:04, David Thrower @.***> wrote:
Here is what I did: I added the tensorflow datasets to a separate requirements file, which I should also do later on with tensorflow-text and other ancillary requirements ... I want to avoid bloating the core and separating the use case specific packages from the core packages.
— Reply to this email directly, view it on GitHub https://github.com/david-thrower/cerebros-core-algorithm-alpha/pull/123#issuecomment-1788153660, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIOQZO4QUBD26KKEQ7MI5LYCF7WNAVCNFSM6AAAAAA6YLAT6CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBYGE2TGNRWGA . You are receiving this because you authored the thread.Message ID: @.*** com>
To reply to your question: "Also complains no CUDA. Please instruct the course of action." - This should be only a warning, which stf will throw whenever it is running on a CPU only machine. - By default, it will JIT compile (except on text classification). This will speed it up on CPUs almost as much as an inexpensive GPU will. This will leverage the XLA which is basically a technology in the CPU that allows tandem linear algebra operations to complete in 1 step (basically an arrangement of transistors such that a multiply-add operation is done with one pulse of current as a single register taking both add and multiply operands concurrently).
https://www.tensorflow.org/xla
Since we are poor, this approach is preferable to GPUs anyway.
https://keras.io/api/models/model_training_apis/
jit_compile: If True, compile the model training step with [XLA](https://www.tensorflow.org/xla). XLA is an optimizing compiler for machine learning. jit_compile is not enabled for by default. Note that jit_compile=True may not necessarily work for all models. For more information on supported operations please refer to the [XLA documentation](https://www.tensorflow.org/xla). Also refer to [known XLA issues](https://www.tensorflow.org/xla/known_issues) for more details.
@sashakolpakov
No problem, I was loading everything into the same requirements.txt as well. This commit just happened to be the one where I caught on to the fact that I need to stop adding more and more to it. Once I package this and put it on PyPi, I think requirements will install automatically with a pip install, so for that reason, I need to separate it ... and need to package this for pypi ...
It did fail, not sure if by a narrow margin or not. I think I will do the following: prepare a smaller dataset, where each class has a small given number of images (need to determine what number). I will make it is balanced, too. Taking just a random subset will lead to very few or no samples in some categories (even though subsampling with 15-20k entries out of 50k total training images with no regard to the dataset being balanced gives us already reasonably good accuracy!)
On Tue, 31 Oct 2023 at 23:10, David Thrower @.***> wrote:
I added a CICD test for this benchmark. Let's pray that this will run on the Github test server in a workable time. If not, we may need to make a miniaturized version of it for the CICD demos. https://github.com/david-thrower/cerebros-core-algorithm-alpha/pull/123/files#diff-cc8c65daed8907e6bb50ac1769d49c05f5f48bdbe8b5cfd3b24b7c5e56ceb8dc
— Reply to this email directly, view it on GitHub https://github.com/david-thrower/cerebros-core-algorithm-alpha/pull/123#issuecomment-1788105772, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIOQZLAUNYRVJJTIAJKW2TYCFZLTAVCNFSM6AAAAAA6YLAT6CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBYGEYDKNZXGI . You are receiving this because you authored the thread.Message ID: @.*** com>
@sashakolpakov , This is what I had to do on the efficientnet cifar10 example. For showcase examples, definitely full - scale is awesome, but for the CICD tests, the test must complete in a timeframe that fits.
What I think a good solution to this problem is that I should make an environment variable like CICD_TEST, then make all the Python scripts look for this, but default to False if the variable does not exist.
If the execution environment the script runs in has the environment variable CICD_TEST set to true, then a small subset of the data is run in the training jobs. If the variable is set absent or set to false, then the full data set runs.
Approved to merge this use case in, but given the scale of compute required, it may be infeasible to have as a routine CICD test for now.