llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Feature Request: Support for iFlytek Spark 13B

Open raulbalmez opened this issue 1 month ago • 2 comments

Prerequisites

  • [X] I am running the latest code. Mention the version if possible as well.
  • [X] I carefully followed the README.md.
  • [X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [X] I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

I am attempting to run the iFlytek Spark model using llama.cpp, as I consider it a new and relevant model. I have worked on building the computation graph and performing all the necessary tasks required for the model's operation. However, despite these efforts, the model fails to function as expected, consistently producing chaotic output. I would appreciate any guidance or suggestions to address this issue.

Motivation

iFlytek Spark is one of the most prominent Chinese models currently available on the market. Enabling access to iFlytek Spark through llama.cpp would be a valuable contribution to the community.

Possible Implementation

I have implemented functions such as build_{model}(), llm_load_tensors(), and other necessary functionalities. However, I am encountering challenges debugging the build_{model}() method. The primary resources available for this model are the weights (base/chat in float32 format) and the code. Since there is no paper or visual representation of the architecture, and the code is highly configurable, it has been difficult to deduce the exact architecture of the model. I would appreciate guidance in resolving these issues.

Code - https://gitee.com/mindspore/mindformers/blob/r1.0/research/iflytekspark/iflytekspark_layers.py Weights - https://gitee.com/iflytekopensource/i-flytek-spark-13-b-model-gpu/blob/master/iFlytekSpark_13B_base_fp32/mp_rank_00_model_states.pt

raulbalmez avatar Jan 14 '25 10:01 raulbalmez