Arno Candel issues

Results 35 issues of


                                            Arno Candel

Add L1/L2 regularization for GBM/DRF.

WIP

non-linear memory usage for wide data

``` from xgboost import XGBRegressor import numpy as np import time for cols in [500000, 600000]: shape = (1000, cols) print("shape: %s" % str(shape)) X = np.random.rand(*shape) y = np.random.rand(shape[0])...

type: bug

Fix multi-GPU generation.

Fixes https://github.com/tloen/alpaca-lora/issues/69, but might still not be balanced.

Add Dockerfile.

``` arno@rippa:/nfs4/llm/h2o-llmstudio(docker)$ docker build -t h2o-llmstudio . ``` ``` arno@rippa:/nfs4/llm/h2o-llmstudio(docker)$ docker run --runtime=nvidia --shm-size=64g -p 10101:10101 --rm h2o-llmstudio 2023/04/23 04:55:13 # 2023/04/23 04:55:13 # ┌────────────────┐ H2O Wave 2023/04/23 04:55:13 #...

Add Quantization code.

Add quantization logic #88

ShareGPT evals for various models

related to https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard but more meaningful scores https://github.com/h2oai/h2ogpt/blob/ba6cad3207f8319b5c5f4b1e9099d7b909fdb661/generate.py#L1328-L1347 In order, from best to worst, using 500 evals using above test, only choosing correct prompt type for each model, everything else...

avoid simultaneous tokenization for multi-process training

https://github.com/h2oai/h2ogpt/blob/8e09f951ba3b9330f21b5581586f5dfd044446cb/finetune.py#L480 this is called 8x on 8-way system, each doing 16 threads, total 128 threads on 128 vcores system, so fine, but still inefficient, too much I/O bound. takes mere...

Fine-tune for summarization

### Try existing fine-tuned h2oGPT 12B model with 'summarize' prompt_type https://github.com/h2oai/h2ogpt/blob/9c2bc937ff72c6e82fb195c4cd713f018e1d8cad/finetune.py#L851-L852 `CUDA_VISIBLE_DEVICES=0,1 python generate.py --base_model=h2oai/h2ogpt-oasst1-512-12b --infer_devices=False --prompt_type=summarize` For fine-tuning, can choose from several others that have permissive license: https://github.com/h2oai/h2ogpt/blob/938b69ff5cd36830f1b77291d60a977b46c7632e/create_data.py#L854-L874

NVIDIA Triton inference support

https://github.com/triton-inference-server/ - [x] Build Triton Docker image with support for FasterTransformer backend for Fusion etc. - [x] convert h2oGPT models to format that Triton understands https://github.com/NVIDIA/FasterTransformer/pull/569 - [x] run h2oGPT...

Quantized version for CPU inference

https://github.com/Digitous/GPTQ-for-GPT-NeoX