Local-File-Organizer no module llama

D:\Projects>git clone https://github.com/QiuYannnn/Local-File-Organizer.git
Cloning into 'Local-File-Organizer'...
remote: Enumerating objects: 170, done.
remote: Counting objects: 100% (40/40), done.
remote: Compressing objects: 100% (20/20), done.
remote: Total 170 (delta 24), reused 20 (delta 20), pack-reused 130 (from 1)
Receiving objects: 100% (170/170), 27.91 MiB | 2.75 MiB/s, done.
Resolving deltas: 100% (75/75), done.

D:\Projects>cd lo*

D:\Projects\Local-File-Organizer>conda activate py
Error while loading conda entry point: conda-libmamba-solver (initialization failed)

EnvironmentNameNotFound: Could not find conda environment: py
You can list all discoverable environments with `conda info --envs`.


Terminate batch job (Y/N)? conda activate pyg
Terminate batch job (Y/N)? y

D:\Projects\Local-File-Organizer>conda activate pyg

(pyg) D:\Projects\Local-File-Organizer>python
Python 3.12.8 (tags/v3.12.8:2dc476b, Dec  3 2024, 19:30:04) [MSC v.1942 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
KeyboardInterrupt
>>> exit()

(pyg) D:\Projects\Local-File-Organizer>pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/cpu --extra-index-url https://pypi.org/simple --no-cache-dir
Looking in indexes: https://nexaai.github.io/nexa-sdk/whl/cpu, https://pypi.org/simple
Collecting nexaai
  Downloading https://github.com/NexaAI/nexa-sdk/releases/download/v0.1.1.0/nexaai-0.1.1.0-cp312-cp312-win_amd64.whl (5.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.7/5.7 MB 8.2 MB/s eta 0:00:00
Collecting cmake (from nexaai)
  Downloading cmake-4.0.0-py3-none-win_amd64.whl.metadata (6.3 kB)
...
Successfully installed altair-5.5.0 audioread-3.0.1 cmake-4.0.0 coloredlogs-15.0.1 ctranslate2-4.6.0 diskcache-5.6.3 fastapi-0.115.12 faster_whisper-1.1.1 flatbuffers-25.2.10 gitdb-4.0.12 gitpython-3.1.44 humanfriendly-10.0 librosa-0.11.0 llvmlite-0.44.0 modelscope-1.25.0 mpmath-1.3.0 msgpack-1.1.0 narwhals-1.35.0 nexaai-0.1.1.0 numba-0.61.2 onnxruntime-1.21.0 pooch-1.8.2 pyarrow-19.0.1 pydeck-0.9.1 pyreadline3-3.5.4 python-multipart-0.0.20 scikit-learn-1.6.1 smmap-5.0.2 soundfile-0.13.1 soxr-0.5.0.post1 starlette-0.46.2 streamlit-1.44.1 streamlit-audiorec-0.1.3 sympy-1.13.3 tabulate-0.9.0 threadpoolctl-3.6.0 toml-0.10.2 uvicorn-0.34.1

[notice] A new release of pip is available: 24.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip

(pyg) D:\Projects\Local-File-Organizer>pip install -r requirements.txt
Requirement already satisfied: cmake in c:\python312\lib\site-packages (from -r requirements.txt (line 1)) (4.0.0)
Requirement already satisfied: pytesseract in c:\python312\lib\site-packages (from -r requirements.txt (line 2)) (0.3.13)
..

[notice] A new release of pip is available: 24.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip
(pyg) D:\Projects\Local-File-Organizer>python main.py
--------------------------------------------------
**NOTE: Silent mode logs all outputs to a text file instead of displaying them in the terminal.
Would you like to enable silent mode? (yes/no): yes
Enter the path of the directory you want to organize: C:\Users\Admin\Downloads\Telegram Desktop
Enter the path to store organized files and folders (press Enter to use 'organized_folder' in the input directory):
Please choose the mode to organize your files:
1. By Content
2. By Date
3. By Type
Enter 1, 2, or 3 (or type '/exit' to exit): 1
model-q4_0.gguf: 100%|████████████████████████████████████████████████████████████| 3.56G/3.56G [08:14<00:00, 7.73MB/s]
Verifying download: 100%|██████████████████████████████████████████████████████████| 3.56G/3.56G [00:14<00:00, 265MB/s]
projector-q4_0.gguf: 100%|██████████████████████████████████████████████████████████| 596M/596M [01:18<00:00, 7.99MB/s]
Verifying download: 100%|████████████████████████████████████████████████████████████| 596M/596M [00:02<00:00, 281MB/s]
Traceback (most recent call last):
  File "D:\Projects\Local-File-Organizer\main.py", line 337, in <module>
    main()
  File "D:\Projects\Local-File-Organizer\main.py", line 222, in main
    initialize_models()
  File "D:\Projects\Local-File-Organizer\main.py", line 51, in initialize_models
    image_inference = NexaVLMInference(
                      ^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\nexa\gguf\nexa_inference_vlm.py", line 155, in __init__
    self._load_model()
  File "C:\Python312\Lib\site-packages\nexa\utils.py", line 312, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\nexa\gguf\nexa_inference_vlm.py", line 168, in _load_model
    self.projector_handler(
  File "C:\Python312\Lib\site-packages\nexa\gguf\llama\llama_chat_format.py", line 2693, in __init__
    import llama_cpp.llava_cpp as llava_cpp
ModuleNotFoundError: No module named 'llama_cpp'

I followed exact steps given in this readme page, but it said this,also installing pip install llama-cpp-python gives:

--------------------------------------------------
**NOTE: Silent mode logs all outputs to a text file instead of displaying them in the terminal.
Would you like to enable silent mode? (yes/no): yes
Enter the path of the directory you want to organize: D:\tg\
Enter the path to store organized files and folders (press Enter to use 'organized_folder' in the input directory):
Please choose the mode to organize your files:
1. By Content
2. By Date
3. By Type
Enter 1, 2, or 3 (or type '/exit' to exit): 1
⠸ 2025-04-16 21:22:22,747 - ERROR - Failed to load model: Failed to load model from file: C:\Users\Admin\.cache\nexa\hub\official\llava-v1.6-vicuna-7b\model-q4_0.gguf. Falling back to CPU.
Traceback (most recent call last):
  File "C:\Python312\Lib\site-packages\nexa\gguf\nexa_inference_vlm.py", line 181, in _load_model
    self.model = Llama(
                 ^^^^^^
  File "C:\Python312\Lib\site-packages\nexa\gguf\llama\llama.py", line 372, in __init__
    internals.LlamaModel(
  File "C:\Python312\Lib\site-packages\nexa\gguf\llama\_internals_transformers.py", line 56, in __init__
    raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: C:\Users\Admin\.cache\nexa\hub\official\llava-v1.6-vicuna-7b\model-q4_0.gguf
Traceback (most recent call last):
  File "C:\Python312\Lib\site-packages\nexa\gguf\nexa_inference_vlm.py", line 181, in _load_model
    self.model = Llama(
                 ^^^^^^
  File "C:\Python312\Lib\site-packages\nexa\gguf\llama\llama.py", line 372, in __init__
    internals.LlamaModel(
  File "C:\Python312\Lib\site-packages\nexa\gguf\llama\_internals_transformers.py", line 56, in __init__
    raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: C:\Users\Admin\.cache\nexa\hub\official\llava-v1.6-vicuna-7b\model-q4_0.gguf

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Projects\Local-File-Organizer\main.py", line 337, in <module>
    main()
  File "D:\Projects\Local-File-Organizer\main.py", line 222, in main
    initialize_models()
  File "D:\Projects\Local-File-Organizer\main.py", line 51, in initialize_models
    image_inference = NexaVLMInference(
                      ^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\nexa\gguf\nexa_inference_vlm.py", line 155, in __init__
    self._load_model()
  File "C:\Python312\Lib\site-packages\nexa\utils.py", line 312, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python312\Lib\site-packages\nexa\gguf\nexa_inference_vlm.py", line 194, in _load_model
    self.model = Llama(
                 ^^^^^^
  File "C:\Python312\Lib\site-packages\nexa\gguf\llama\llama.py", line 372, in __init__
    internals.LlamaModel(
  File "C:\Python312\Lib\site-packages\nexa\gguf\llama\_internals_transformers.py", line 56, in __init__
    raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: C:\Users\Admin\.cache\nexa\hub\official\llava-v1.6-vicuna-7b\model-q4_0.gguf

Apr 16 '25 16:04 neptotech

Please, use history (llinux) , doskey /history (windows) or any commands similar to them and give a full list of commands executed for the installation, please indicate whether you are installing for CPU or GPU and as much information of your system as possible: OS CPU version GPU version.... Also, this seems to be a problem with Nexaai or llama_cpp_python, please check issues in those two repos just in case there might be a solution for you.

May 19 '25 12:05 DanielRosiqueEgea

Please, use history (llinux) , doskey /history (windows) or any commands similar to them and give a full list of commands executed for the installation, please indicate whether you are installing for CPU or GPU and as much information of your system as possible: OS CPU version GPU version.... Also, this seems to be a problem with Nexaai or llama_cpp_python, please check issues in those two repos just in case there might be a solution for you.

Ran into the same issue. Trying to troubleshoot myself, but so far no dice. Here is my current error after running main:


Local-File-Organizer>python main.py                                                        --------------------------------------------------
**NOTE: Silent mode logs all outputs to a text file instead of displaying them in the terminal.
Would you like to enable silent mode? (yes/no): yes
Enter the path of the directory you want to organize: C:\Users\terra\Coding\Emma Old Flashdrive
Enter the path to store organized files and folders (press Enter to use 'organized_folder' in the input directory):
Please choose the mode to organize your files:
1. By Content
2. By Date
3. By Type
Enter 1, 2, or 3 (or type '/exit' to exit): 1
Traceback (most recent call last):
  File "C:\Users\terra\anaconda3\Lib\site-packages\llama_cpp\llama_cpp.py", line 70, in _load_shared_library
    return ctypes.CDLL(str(_lib_path), **cdll_args)  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\terra\anaconda3\Lib\ctypes\__init__.py", line 379, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: Could not find module 'C:\Users\terra\anaconda3\Lib\site-packages\llama_cpp\llama.dll' (or one of its dependencies). Try using the full path with constructor syntax.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\terra\Coding\Local-File-Organizer\main.py", line 337, in <module>
    main()
  File "C:\Users\terra\Coding\Local-File-Organizer\main.py", line 222, in main
    initialize_models()
  File "C:\Users\terra\Coding\Local-File-Organizer\main.py", line 51, in initialize_models
    image_inference = NexaVLMInference(
                      ^^^^^^^^^^^^^^^^^
  File "C:\Users\terra\anaconda3\Lib\site-packages\nexa\gguf\nexa_inference_vlm.py", line 155, in __init__
    self._load_model()
  File "C:\Users\terra\anaconda3\Lib\site-packages\nexa\utils.py", line 312, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\terra\anaconda3\Lib\site-packages\nexa\gguf\nexa_inference_vlm.py", line 168, in _load_model
    self.projector_handler(
  File "C:\Users\terra\anaconda3\Lib\site-packages\nexa\gguf\llama\llama_chat_format.py", line 2693, in __init__
    import llama_cpp.llava_cpp as llava_cpp
  File "C:\Users\terra\anaconda3\Lib\site-packages\llama_cpp\__init__.py", line 1, in <module>
    from .llama_cpp import *
  File "C:\Users\terra\anaconda3\Lib\site-packages\llama_cpp\llama_cpp.py", line 83, in <module>
    _lib = _load_shared_library(_lib_base_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\terra\anaconda3\Lib\site-packages\llama_cpp\llama_cpp.py", line 72, in _load_shared_library
    raise RuntimeError(f"Failed to load shared library '{_lib_path}': {e}")
RuntimeError: Failed to load shared library 'C:\Users\terra\anaconda3\Lib\site-packages\llama_cpp\llama.dll': Could not find module 'C:\Users\terra\anaconda3\Lib\site-packages\llama_cpp\llama.dll' (or one of its dependencies). Try using the full path with constructor syntax.

And doskey /history of everything leading up to running main again

python main.py
pip uninstall fitz
pip install pymupdf
python main.py
pip install docx
python main.py
pip install exceptions
pip uninstall docx
pip install python-docx
python main.py
pip install pptx
pip install -r requirements.txt
python main.py
pip install llama_cpp
pip install llama-cpp-python
pip install --no-cache-dir llama-cpp-python==0.2.77 --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124
python main.py
print "At this point installing CUDA Toolkit 11.7"
pip install llama-cpp-python --prefer-binary --no-cache-dir --extra-index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX2/cu121
.\python.exe -m pip install llama-cpp-python --prefer-binary --no-cache-dir --extra-index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX2/cu121
doskey /history
python main.py
doskey /history

Next step I'm attempting is to try installing CUDA 12.9 based on this thread

May 29 '25 18:05 Terrancejpeters

I'm having this exact same issue. File organizer works when selecting date or type, but not content. Has anyone tried just running a model with nexa to make sure inference is working on your local machines? I tried running llava 1.6 and the program came back with an error for no module named 'llama_cpp.' I'll gather more information today to see if this is related, but I'm definitely having the same issue. Quite possibly related to the nexa install.

This was my bash output when trying to run llava 1.6 with nexa:

Model llava-v1.6-vicuna-7b:model-q4_0 already exists at /home/radmin/.cache/nexa/hub/official/llava-v1.6-vicuna-7b/model-q4_0.gguf
Model llava-v1.6-vicuna-7b:projector-q4_0 already exists at /home/radmin/.cache/nexa/hub/official/llava-v1.6-vicuna-7b/projector-q4_0.gguf
Error running ggml inference: No module named 'llama_cpp'
Please refer to our docs to install nexaai package: https://docs.nexaai.com/getting-started/installation

Also, I am able to run other models via nexa such as Qwen 2.5. Just not llava 1.6.

Jun 18 '25 13:06 defmodenet

After a lot of exploration, it seems like llava vicuna will not utilize llama-cpp unless ran with huggingface. I think this error is related to how the model is initialized, and which arguments are applied. I was only able to move past the error by editing the code to import OmniVLM as the inference model, at which point main.py was able to execute to completion. However, the directory I used was renamed, but the contents were left out. See below for completed execution:

(local_file_organizer) radmin@Inquiry:~/Local-File-Organizer$ python main.py
--------------------------------------------------
**NOTE: Silent mode logs all outputs to a text file instead of displaying them in the terminal.
Would you like to enable silent mode? (yes/no): yes
Enter the path of the directory you want to organize: /home/radmin/unknown
Enter the path to store organized files and folders (press Enter to use 'organized_folder' in the input directory):
Please choose the mode to organize your files:
1. By Content
2. By Date
3. By Type
Enter 1, 2, or 3 (or type '/exit' to exit): 1
model-q4_K_M.gguf: 100%|█████████████████████████████████████████████████████████████| 379M/379M [00:04<00:00, 90.6MB/s]
Verifying download: 100%|████████████████████████████████████████████████████████████| 379M/379M [00:00<00:00, 1.09GB/s]
projector-q4_K_M.gguf: 100%|█████████████████████████████████████████████████████████| 913M/913M [00:13<00:00, 68.6MB/s]
Verifying download: 100%|████████████████████████████████████████████████████████████| 913M/913M [00:00<00:00, 1.15GB/s]
⠋ ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
⠙ llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3060) - 11697 MiB free
llama_model_loader: loaded meta data with 24 key-value pairs and 290 tensors from /home/radmin/.cache/nexa/hub/official/omniVLM/model-q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Nano Vlm Processor
llama_model_loader: - kv   3:                         general.size_label str              = 494M
llama_model_loader: - kv   4:                          qwen2.block_count u32              = 24
llama_model_loader: - kv   5:                       qwen2.context_length u32              = 131072
llama_model_loader: - kv   6:                     qwen2.embedding_length u32              = 896
llama_model_loader: - kv   7:                  qwen2.feed_forward_length u32              = 4864
llama_model_loader: - kv   8:                 qwen2.attention.head_count u32              = 14
llama_model_loader: - kv   9:              qwen2.attention.head_count_kv u32              = 2
llama_model_loader: - kv  10:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  11:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  12:                          general.file_type u32              = 15
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
⠹ llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  18:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  19:            tokenizer.ggml.padding_token_id u32              = 151645
llama_model_loader: - kv  20:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  21:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  22:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
llama_model_loader: - kv  23:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  121 tensors
llama_model_loader: - type q5_0:  132 tensors
llama_model_loader: - type q8_0:   13 tensors
llama_model_loader: - type q4_K:   12 tensors
llama_model_loader: - type q6_K:   12 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 373.71 MiB (6.35 BPW)
init_tokenizer: initializing tokenizer for type 2
load: control token: 151672 '<|placeholder_6|>' is not marked as EOG
load: control token: 151669 '<|placeholder_3|>' is not marked as EOG
load: control token: 151667 '<|placeholder_1|>' is not marked as EOG
load: control token: 151666 '<|ocr_end|>' is not marked as EOG
load: control token: 151665 '<|ocr_start|>' is not marked as EOG
load: control token: 151660 '<|fim_middle|>' is not marked as EOG
load: control token: 151659 '<|fim_prefix|>' is not marked as EOG
load: control token: 151653 '<|vision_end|>' is not marked as EOG
load: control token: 151648 '<|box_start|>' is not marked as EOG
load: control token: 151646 '<|object_ref_start|>' is not marked as EOG
load: control token: 151649 '<|box_end|>' is not marked as EOG
load: control token: 151674 '<|placeholder_8|>' is not marked as EOG
load: control token: 151655 '<|image_pad|>' is not marked as EOG
load: control token: 151651 '<|quad_end|>' is not marked as EOG
load: control token: 151673 '<|placeholder_7|>' is not marked as EOG
load: control token: 151647 '<|object_ref_end|>' is not marked as EOG
load: control token: 151668 '<|placeholder_2|>' is not marked as EOG
load: control token: 151670 '<|placeholder_4|>' is not marked as EOG
load: control token: 151652 '<|vision_start|>' is not marked as EOG
load: control token: 151675 '<|placeholder_9|>' is not marked as EOG
load: control token: 151671 '<|placeholder_5|>' is not marked as EOG
load: control token: 151654 '<|vision_pad|>' is not marked as EOG
load: control token: 151656 '<|video_pad|>' is not marked as EOG
load: control token: 151644 '<|im_start|>' is not marked as EOG
load: control token: 151661 '<|fim_suffix|>' is not marked as EOG
load: control token: 151650 '<|quad_start|>' is not marked as EOG
load: special tokens cache size = 33
⠸ load: token to piece cache size = 0.9312 MB
print_info: arch             = qwen2
print_info: vocab_only       = 0
print_info: n_ctx_train      = 131072
print_info: n_embd           = 896
print_info: n_layer          = 24
print_info: n_head           = 14
print_info: n_head_kv        = 2
print_info: n_rot            = 64
print_info: n_swa            = 0
print_info: n_embd_head_k    = 64
print_info: n_embd_head_v    = 64
print_info: n_gqa            = 7
print_info: n_embd_k_gqa     = 128
print_info: n_embd_v_gqa     = 128
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-06
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: n_ff             = 4864
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 1000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 131072
print_info: rope_finetuned   = unknown
print_info: ssm_d_conv       = 0
print_info: ssm_d_inner      = 0
print_info: ssm_d_state      = 0
print_info: ssm_dt_rank      = 0
print_info: ssm_dt_b_c_rms   = 0
print_info: model type       = 1B
print_info: model params     = 494.03 M
print_info: general.name     = Nano Vlm Processor
print_info: vocab type       = BPE
print_info: n_vocab          = 151936
print_info: n_merges         = 151387
print_info: BOS token        = 151643 '<|endoftext|>'
print_info: EOS token        = 151645 '<|im_end|>'
print_info: EOT token        = 151645 '<|im_end|>'
print_info: PAD token        = 151645 '<|im_end|>'
print_info: LF token         = 198 'Ċ'
print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
print_info: FIM MID token    = 151660 '<|fim_middle|>'
print_info: FIM PAD token    = 151662 '<|fim_pad|>'
print_info: FIM REP token    = 151663 '<|repo_name|>'
print_info: FIM SEP token    = 151664 '<|file_sep|>'
print_info: EOG token        = 151643 '<|endoftext|>'
print_info: EOG token        = 151645 '<|im_end|>'
print_info: EOG token        = 151662 '<|fim_pad|>'
print_info: EOG token        = 151663 '<|repo_name|>'
print_info: EOG token        = 151664 '<|file_sep|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: layer   0 assigned to device CPU
load_tensors: layer   1 assigned to device CPU
load_tensors: layer   2 assigned to device CPU
load_tensors: layer   3 assigned to device CPU
load_tensors: layer   4 assigned to device CPU
load_tensors: layer   5 assigned to device CPU
load_tensors: layer   6 assigned to device CPU
load_tensors: layer   7 assigned to device CPU
load_tensors: layer   8 assigned to device CPU
load_tensors: layer   9 assigned to device CPU
load_tensors: layer  10 assigned to device CPU
load_tensors: layer  11 assigned to device CPU
load_tensors: layer  12 assigned to device CPU
load_tensors: layer  13 assigned to device CPU
load_tensors: layer  14 assigned to device CPU
load_tensors: layer  15 assigned to device CPU
load_tensors: layer  16 assigned to device CPU
load_tensors: layer  17 assigned to device CPU
load_tensors: layer  18 assigned to device CPU
load_tensors: layer  19 assigned to device CPU
load_tensors: layer  20 assigned to device CPU
load_tensors: layer  21 assigned to device CPU
load_tensors: layer  22 assigned to device CPU
load_tensors: layer  23 assigned to device CPU
load_tensors: layer  24 assigned to device CPU
load_tensors: tensor 'token_embd.weight' (q8_0) (and 290 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead
load_tensors: offloading 0 repeating layers to GPU
load_tensors: offloaded 0/25 layers to GPU
load_tensors:   CPU_Mapped model buffer size =   373.71 MiB
.................................................................
q3_K_M.gguf: 100%|█████████████████████████████████████████████████████████████████| 1.57G/1.57G [00:17<00:00, 96.6MB/s]
Verifying download: 100%|██████████████████████████████████████████████████████████| 1.57G/1.57G [00:01<00:00, 1.18GB/s]
⠼ llama_init_from_model: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
**----------------------------------------------**
**       Image inference model initialized      **
**       Text inference model initialized       **
**----------------------------------------------**
--------------------------------------------------
Would you like to proceed with these changes? (yes/no): yes
Would you like to organize another directory? (yes/no): no

Additionally, I tried running each model in the VLM class (for image inference) with no success. Only the Omni VLM class allowed the program to complete with no errors.

Jun 22 '25 20:06 defmodenet

I'm experiencing the same error (macos silicon) so would be grateful for a solution, please.

Jun 23 '25 14:06 loni415

move past the error by editing the code to import OmniVLM as the inference model,

Would you be so kind as to share the steps to accomplish this? Ty.

Jun 24 '25 08:06 reconrad48

@reconrad48 Just a disclaimer: These changes should be made on local machines only and are not intended for distribution unless licensed to do so. For those not familiar with editing software: modify and run at your own risk.

Navigate to the gguf directory and open __init.py__ in an editor of your choice. go to line 6 and insert from .nexa_inference_vlm_omni import NexaOmniVlmInference Then, enter a new line under line 13 (so you'll have a blank space between line 13 and 15) and ensure you are exactly 4 spaces over from the left and insert "NexaOmniVlmInference" Double check that the closing square bracket is still far left on line 15. Save and close the file and navigate to the directory that contains main.py for your local file organizer and open in the editor. On line 27, replace NexaVLMInference with NexaOmniVlmInference Go down to line 45 and replace "llava-v1.6-vicuna-7b:q4_0" with "omniVLM:q4_K_M" Proceed to line 51 and replace NexaVLMInference with NexaOmniVlmInference Save and close the file. Double check your environment is active and execute python main.py Note: These edits do not take into account specifics as it pertains to the Omni VLM model parameters, so keep that in mind. Furthermore, I don't know how your environment is set up, so this might not necessarily work for you. In my case, this allowed the main.py program to run to completion when choosing to organize by content. However, I have not fully tested this configuration at length, so again, proceed with caution.

Jun 25 '25 01:06 defmodenet

Testing with the modifications failed with the following error:

⠼ llama_init_from_model: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
**----------------------------------------------**
**       Image inference model initialized      **
**       Text inference model initialized       **
**----------------------------------------------**
Processing Free-Stock-Photos-01-2048x1366.jpg ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:00
Traceback (most recent call last):
  File "/home/radmin/Local-File-Organizer/main.py", line 339, in <module>
    main()
  File "/home/radmin/Local-File-Organizer/main.py", line 253, in main
    data_images = process_image_files(image_files, image_inference, text_inference, silent=silent_mode, log_file=log_file)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/radmin/Local-File-Organizer/image_data_processing.py", line 59, in process_image_files
    data = process_single_image(image_path, image_inference, text_inference, silent=silent, log_file=log_file)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/radmin/Local-File-Organizer/image_data_processing.py", line 36, in process_single_image
    foldername, filename, description = generate_image_metadata(image_path, progress, task_id, image_inference, text_inference)
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/radmin/Local-File-Organizer/image_data_processing.py", line 71, in generate_image_metadata
    description_generator = image_inference._chat(description_prompt, image_path)
                            ^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NexaOmniVlmInference' object has no attribute '_chat'

So, Omni VLM will not work due to having no chat attribute.

Jun 25 '25 03:06 defmodenet

I just installed nexaai 0.0.9.9 and it now works

Jul 03 '25 04:07 mattsmith95

Thank you @mattsmith95 , this worked for me (macos silicon).

For anyone else struggling with this, the command I used was: CMAKE_ARGS="-DGGML_METAL=ON" pip install nexaai==0.0.9.9 --prefer-binary --index-url https://github.nexa.ai/whl/metal --extra-index-url https://pypi.org/simple --no-cache-dir

Jul 03 '25 18:07 loni415

@mattsmith95 this helped me get beyond the initial error as well, thank you!

On Debian 12 with rtx 3060, installed nexa ai with the following:

CMAKE_ARGS="-DGGML_CUDA=ON -DSD_CUBLAS=ON" pip install nexaai==0.0.9.9 --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/cu124 --extra-index-url https://pypi.org/simple --no-cache-dir

Jul 04 '25 19:07 defmodenet

no module llama_cpp