Vulcan

Results 52 comments of Vulcan

Info from run.py: ``` grok_1_model = LanguageModelConfig( vocab_size=128 * 1024, pad_token=0, eos_token=2, sequence_len=8192, embedding_init_scale=1.0, output_multiplier_scale=0.5773502691896257, embedding_multiplier_scale=78.38367176906169, model=TransformerConfig( emb_size=48 * 128, widening_factor=8, key_size=128, num_q_heads=48, num_kv_heads=8, num_layers=64, attn_output_multiplier=0.08838834764831845, shard_activations=True, # MoE. num_experts=8,...

https://github.com/ggerganov/whisper.cpp/issues/10#issuecomment-1264695986 Noise removal such as with rrnoise https://jmvalin.ca/demo/rnnoise/ also increases accuracy of whisper even if sped up a bit. Also had a good experience with "audio companding / compression" in...

@pablogranolabar Is there a noticeable difference in quality of output of GPT-JT compared to GPT-J?

> So for canned general tasks like causal LM it's potentially worse in whatever you would consider precision and accuracy, but with quality prompt engineering all of these additional tasks...

@pablogranolabar Thanks for sharing the great idea about using GPT-JT @ggerganov Thanks for the fix I uploaded the model to huggingface so that its easy for people to get hold...

Tried converting: The error starts with: ``` python3 convert-h5-to-ggml.py galactica-1.3b/ ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /mnt/TRIPFS/SPACE/ggml/build/examples/gpt-j/convert-h5-to-ggml.py:58 in │ │ │ │ 55 dir_model = sys.argv[1] │ │...

@ggerganov I get similar errors as above when trying to convert neox 20b. How do I create the added_tokens.json file? ``` python3 convert-h5-to-ggml.py gpt-neox-20b/ ╭─────────────────────────────── Traceback (most recent call last)...

> Every model can be ported to ggml, but it requires some work. I guess it would be better if I try to make the codebase easier to understand and...

Closing as svgo in a pipeline can be used. Also this is probably not relevant for the current version.

I can confirm that this happens on oracle cloud. The arch is aarch64 and not x86_64. Just git cloned and ran.