Xiao-Yong Jin

Argonne National Laboratory Physicist, Computer Scientist, HEP theorist, HPC practitioner.

Results 40 comments of


                                            Xiao-Yong Jin

Running a Vicuna-13B 4it model ?

The vicuna v1.1 model used a different setup. See https://github.com/lm-sys/FastChat/blob/f85f489f2d5e48c37cceb2f00c3edc075c5d3711/fastchat/conversation.py#L115-L124 and https://github.com/lm-sys/FastChat/blob/f85f489f2d5e48c37cceb2f00c3edc075c5d3711/fastchat/conversation.py#L37-L44 IIUC, the prompt in Borne shell string is `"$system USER: $instruction ASSISTANT:"`. Their doc says this https://github.com/lm-sys/FastChat/blob/f85f489f2d5e48c37cceb2f00c3edc075c5d3711/docs/weights_version.md#example-prompt-weight-v11 I...

download.sh returns 403 forbidden error

Downloading `LICENSE`, `USE_POLICY.md`, `tokenizer.model`, and `tokenizer_checklist.chk` are fine, but downloading any model specific files gives 403 forbidden.

MPI_Test is blocking and MPI_Start doe not start communication

What is the status for the MPI_Test blocking issue with inter-node comms?

Query: QUDA Feature-SYCL branch

I though this line controls the size, no? https://github.com/lattice/quda/blob/aa2ea419ce0f6f78f842f85f40cb2a607944c957/include/targets/sycl/target_device.h#L196

llama3 family support

In their code, the chat format is here: https://github.com/meta-llama/llama3/blob/299bfd8212fec65698c2f8c7b5970cbbb74c2a4f/llama/tokenizer.py#L202

Support Llama 3 conversion

The instruct models need the `tokenizer.ggml.eos_token_id` to be 128009, or ``.

Support Llama 3 conversion

It seems the model generates `` with the official chat template. Otherwise it may generate ``.

rng: avoid extreme values in gaussian

This breaks backward compatibility. If this is acceptable, we need to double check our unit tests and fix the non-FUELCompat tests that uses Gaussian random numbers.

Whisper large v3 model repeats a lot

There is the no speech token that currently whisper.cpp ignores https://github.com/ggerganov/whisper.cpp/blob/447d49530c9af41fe24f2ae510f452903dba330d/whisper.cpp#L4592 Actually implement no speech threshold similar to openai/whisper might help.

Show progress report while tuning

Most of the time we don't really want much output, especially when called from a third party app. Depending on the usage, I would be happy if we can have...

‹
1
2
3
4
›