Intelli feat: Add Offline DeepSeek Model

This PR implements an offline DeepSeek model loader and inference wrapper fulfilling all requirements in issue. It provides a lightweight, memory-efficient, dependency-minimal way to load and run DeepSeek models from HuggingFace which supports all official DeepSeek R1 & Distill variants using HuggingFace + safetensors. Dynamic model config parsing and deeper memory optimizations (e.g., Triton/offloading) can be addressed in a follow-up issue if needed.

NOTES TO REVIEWERS

Tested on DeepSeek-R1-Distill-Qwen-7B that works offline and follows low-level-only policy and here's o/p in local testing :


> python -m unittest intelli.test.integration.test_deepseek_wrapper

----------------------------------------------------------------------
Ran 1 test in 129.257s

OK
Downloading model.safetensors.index.json from deepseek-ai/DeepSeek-R1-Distill-Qwen-7B...
Index downloaded to /home/runner/.cache/deepseek/models--deepseek-ai--DeepSeek-R1-Distill-Qwen-7B/snapshots/91[6](https://github.com/varshith257/Intelli/pull/1/checks#step:6:7)b56a44061fd5cd7d6a8fb63255[7](https://github.com/varshith257/Intelli/pull/1/checks#step:6:8)ed4f724f60/model.safetensors.index.json
Downloading config.json from deepseek-ai/DeepSeek-R1-Distill-Qwen-7B...
Config downloaded to /home/runner/.cache/deepseek/models--deepseek-ai--DeepSeek-R1-Distill-Qwen-7B/snapshots/916b56a44061fd5cd7d6a[8](https://github.com/varshith257/Intelli/pull/1/checks#step:6:9)fb632557ed4f724f60/config.json
Downloading model.safetensors.index.json from deepseek-ai/DeepSeek-R1-Distill-Qwen-7B...
Index downloaded to /home/runner/.cache/deepseek/models--deepseek-ai--DeepSeek-R1-Distill-Qwen-7B/snapshots/[9](https://github.com/varshith257/Intelli/pull/1/checks#step:6:10)16b56a44061fd5cd7d6a8fb632557ed4f724f60/model.safetensors.index.json
Downloading model-00002-of-000002.safetensors from deepseek-ai/DeepSeek-R1-Distill-Qwen-7B...
Model downloaded to /home/runner/.cache/deepseek/models--deepseek-ai--DeepSeek-R1-Distill-Qwen-7B/snapshots/916b56a44061fd5cd7d6a8fb632557ed4f724f60/model-00002-of-000002.safetensors
Downloading model-00001-of-000002.safetensors from deepseek-ai/DeepSeek-R1-Distill-Qwen-7B...
Model downloaded to /home/runner/.cache/deepseek/models--deepseek-ai--DeepSeek-R1-Distill-Qwen-7B/snapshots/916b56a44061fd5cd7d6a8fb632557ed4f724f60/model-00001-of-000002.safetensors
Model weights loaded successfully from split safetensors.
Inference successful, output shape: torch.Size([1, 16, 152064])

DETAILED SETUP AND TESTING IS DOCUMENTED IN README

/claim #82

Apr 05 '25 18:04 varshith257

@intelligentnode @Barqawiz Let me know if I am missing anything and it's ready for review

Apr 05 '25 19:04 varshith257

@intelligentnode Any reviews on this :)

Apr 11 '25 14:04 varshith257

Thanks for the attempt, it is good you used only low level dependencies.

Kindly answer the following questions to help with the review:

There is no tokenizer implementation. How will text be converted to token IDs ?
While the code tested using Qwen-7B, can the same code run deepseek R1 ?
Does this implementation work on both CPU and GPU devices ? How to test in GPU and which device to use ?

Apr 20 '25 11:04 intelligentnode

@intelligentnode Thanks for taking time to review this PR.

There is no tokenizer implementation. How will text be converted to token IDs ?

Yeah, thanks for reminding. I have added support for tokenization with minimal overhead and deps

While the code tested using Qwen-7B, can the same code run deepseek R1 ?

Yes, the implementation is architecture-agnostic, we rely on config.json + safetensors.index.json it can load any DeepSeek-R1 variant including the full 671B model when provided sufficient RAM/VRAM. R1 is very large (670B total / ~37B active parameters). So we need a lot of RAM + VRAM which most Macs also don’t have unless on a high-end M3 Ultra with unified memory

Does this implementation work on both CPU and GPU devices ? How to test in GPU and which device to use ?

Yes it works both with the GPU and CPU and to test with GPU you can use

CUDA_VISIBLE_DEVICES=0 python -m unittest intelli.test.integration.test_deepseek_wrapper

For full DeepSeek-R1, the recommend testing could be on A100 80GB or A6000 48GB class GPUs. Quantized versions can fit in 35–40GB VRAM and are loadable via our implementation using safetensors.index.json split loading

( My desktop has access for RTX 3050 GPU, so tested Qwen-7B and other variations on both cpu and GPU env's

Apr 20 '25 17:04 varshith257

Let me know if anything to add or iterate on this MR

Apr 20 '25 17:04 varshith257

Deepseek use Byte-level BPE (BBPE), not whitespace word tokenizer.

Please adjust.

Apr 27 '25 10:04 intelligentnode

Feedback:

There are no test cases to validate the output generate valid output. add test cases to generate test and validate the string readable.
When I switch the model name to "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" and error printed. The code should support all deepseek models. Entry Not Found for url: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/resolve/main/model.safetensors.index.json.

May 01 '25 07:05 intelligentnode

When I print the output, it does not print string but the tokens!

output tokenize: tensor([[[-0.4666, 0.2092, -0.5391, ..., 0.5234, -0.0457, 0.3164], [-0.6572, 0.4009, -0.6992, ..., 0.4370, -0.1760, 0.3025], [-0.4568, 0.0883, -0.8530, ..., 0.4050, -0.0996, 0.5566], ..., [-0.6572, 0.4009, -0.6992, ..., 0.4370, -0.1760, 0.3025], [-0.5752, 0.4082, -0.5093, ..., 0.4324, -0.1279, 0.1831], [-0.3921, 0.2646, -0.7368, ..., 0.5825, -0.1241, 0.3960]]],

Kindly do full testing for multiple deepseek models, and make sure the output string printed and readable before next review iteration.

May 01 '25 12:05 intelligentnode

@intelligentnode Thanks for detailed review and yes some variants aren't using shared indices, I have fixed them now and as we discussed earlier i implemented the basic BPE tokenizer and also implemented decode fn to make it more redable now

And testing command has been updated and tested on listed models in the issue of R1 variants that supported CPU. Kindly use thie following cd to test them now it solves dynamically for diff variants

DEEPSEEK_MODEL= <model_name> python -m unittest intelli.test.integration.test_deepseek_wrapper

May 03 '25 19:05 varshith257

@intelligentnode @Barqawiz Can you review this, please?

May 12 '25 14:05 varshith257

There is conflicts and the outputs not working as expected.

May 31 '25 20:05 intelligentnode

@intelligentnode Can you paste the logs you are getting? I tested it and got fully decode roundtrip string

May 31 '25 21:05 varshith257

51s
Run python -m unittest intelli.test.integration.test_deepseek_wrapper
....
----------------------------------------------------------------------
Ran 4 tests in 49.457s

OK

--- Running test: test_bpe_tokenization ---
Downloading config.json from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B...
Config downloaded to /home/runner/.cache/deepseek/models--deepseek-ai--DeepSeek-R1-Distill-Qwen-1.5B/snapshots/ad9f0ae0864d7fbcd1cd905e3c6c5b069cc8b562/config.json
Downloading model.safetensors from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B...
Model downloaded to /home/runner/.cache/deepseek/models--deepseek-ai--DeepSeek-R1-Distill-Qwen-1.5B/snapshots/ad9f0ae0864d7fbcd1cd905e3c6c5b069cc8b562/model.safetensors
Loaded single safetensors file.
Loaded model from single safetensors file.

--- Running test: test_encode_decode_roundtrip ---
Downloading config.json from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B...
Config downloaded to /home/runner/.cache/deepseek/models--deepseek-ai--DeepSeek-R1-Distill-Qwen-1.5B/snapshots/ad9f0ae0864d7fbcd1cd905e3c6c5b069cc8b562/config.json
Downloading model.safetensors from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B...
Model downloaded to /home/runner/.cache/deepseek/models--deepseek-ai--DeepSeek-R1-Distill-Qwen-1.5B/snapshots/ad9f0ae0864d7fbcd1cd905e3c6c5b069cc8b562/model.safetensors
Loaded single safetensors file.
Loaded model from single safetensors file.
Decoding tokens: [1986, 128, 128244, 285, 128, 128244, 64, 128, 128244, 874, 52899, 128, 128244, 1944, 128, 128244, 917, 128, 128244, 1958, 128, 128244, 5839, 2022, 13]
Raw tokens: ['This', 'Ä', 'is', 'Ä', 'a', 'Ä', 'com', 'prehensive', 'Ä', 'test', 'Ä', 'string', 'Ä', 'for', 'Ä', 'token', 'ization', '.']
Round-trip decoded  : This is a comprehensive test string for tokenization.

--- Running test: test_load_and_infer ---
Downloading config.json from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B...
Config downloaded to /home/runner/.cache/deepseek/models--deepseek-ai--DeepSeek-R1-Distill-Qwen-1.5B/snapshots/ad9f0ae0864d7fbcd1cd905e3c6c5b069cc8b562/config.json
Downloading model.safetensors from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B...
Model downloaded to /home/runner/.cache/deepseek/models--deepseek-ai--DeepSeek-R1-Distill-Qwen-1.5B/snapshots/ad9f0ae0864d7fbcd1cd905e3c6c5b069cc8b562/model.safetensors
Loaded single safetensors file.
Loaded model from single safetensors file.
Inference successful, output shape: torch.Size([1, 16, 151936])

--- Running test: test_tokenize_and_infer_from_text ---
Downloading config.json from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B...
Config downloaded to /home/runner/.cache/deepseek/models--deepseek-ai--DeepSeek-R1-Distill-Qwen-1.5B/snapshots/ad9f0ae0864d7fbcd1cd905e3c6c5b069cc8b562/config.json
Downloading model.safetensors from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B...
Model downloaded to /home/runner/.cache/deepseek/models--deepseek-ai--DeepSeek-R1-Distill-Qwen-1.5B/snapshots/ad9f0ae0864d7fbcd1cd905e3c6c5b069cc8b562/model.safetensors
Loaded single safetensors file.
Loaded model from single safetensors file.
Token IDs: [1986, 128, 128244, 285, 128, 128244, 64, 128, 128244, 5839, 2022, 128, 128244, 1944, 13]
Inference successful on text input, output shape: torch.Size([1, 15, 151936])

May 31 '25 21:05 varshith257

Hey @intelligentnode, the issue was closed citing time-sensitivity, but no clear deadline was mentioned upfront. The bounty was also reduced mid-way . Contributors put in genuine effort based on the original scope and that should be acknowledged and fairly rewarded. Could get some clarity on this?

May 31 '25 21:05 varshith257

The issue is that there was no testing conducted during this contribution, and there seemed to be limited understanding of DeepSeek’s capabilities and how tokenizers work! The process of testing and providing feedback became quite lengthy. Since DeepSeek is a time-sensitive topic, this delay impacted our ability to proceed efficiently.

Look to the history I gave fair chance for this change. Please also note that the bounty platform does not support setting explicit deadlines and this create to catch event which is passed without ready code.

For the effort you did I can send you small reward contact me here: https://www.intellinode.ai/contact

Oct 19 '25 20:10 intelligentnode