llama.cpp
llama.cpp copied to clipboard
[User] Embedding doesn't seem to work?
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [X ] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [X ] I carefully followed the README.md.
- [X ] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [X ] I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
I'm trying to use llama.cpp to generate sentence embeddings, and then use a query to search for answers in a vector database. But my code doesn't work. Upon further inspection, it seems that the sentence embeddings generated by llama.cpp is not trustworthy. This can be reproduced by the embedding example:
./embedding -m models/7B/ggml-model-q4_0.bin -p "hello" -n 512
./embedding -m models/7B/ggml-model-q4_0.bin -p "hello " -n 512
notice that the only difference between the above two commands is that there is an extra space in the second prompt. But the above will result in completely different embeddings. I would assume, since the meaning of the prompts is the same, the extra space shouldn't cause the embedding to be very different.
Is the embedding function working?
Current Behavior
The current embedding output seems to be random?
Environment and Context
Linux + A100
- Physical (or virtual) hardware you are using, e.g. for Linux:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD Ryzen Threadripper PRO 3975WX 32-Cores
Stepping: 0
CPU MHz: 2195.790
CPU max MHz: 4368.1641
CPU min MHz: 2200.0000
BogoMIPS: 6987.21
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
L3 cache: 16384K
NUMA node0 CPU(s): 0-63
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sev sev_es
- Operating System, e.g. for Linux:
Linux artserver1 5.19.0-32-generic #33~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Jan 30 17:03:34 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
- SDK version, e.g. for Linux:
Python 3.10.9
GNU Make 4.1
g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Failure Information (for bugs)
The embedding output can be altered by adding a space in the prompt.
Steps to Reproduce
./embedding -m models/7B/ggml-model-q4_0.bin -p "hello" -n 512
./embedding -m models/7B/ggml-model-q4_0.bin -p "hello " -n 512
build the project and run the official embedding example like the above and compare the generated embeddings.
It seems embedding.cpp returns the output embeddings.
In reality "hello" and "hello " is a different phrase. However these two phrases should be closer to each other than to other phrases. I've made two scripts for testing of the embedding behaviour namely:
get_embeddings.sh
:
#!/bin/bash
# /* +----------------------------------+ */
# /* | LLaMA Embeddings Tester | */
# /* | get_embeddings.sh | */
# /* | (c)copyright nitram147 2023 | */
# /* +----------------------------------+ */
usage="Usage: bash $0 path_to_model phrase"
if [[ $# -ne 2 ]]; then
echo "Invalid number of parameters!" >&2
echo "$usage"
exit 1
fi
if [[ ! -f $1 ]]; then
echo "Invalid path to model!" >&2
echo "$usage"
exit 2
fi
# better way would be to calculate model's weight hash, however that would take a while
model_path_hash=$(echo -n "$1" | sha256sum | head -c 64)
phrase_hash=$(echo -n "$2" | sha256sum | head -c 64)
mkdir -p results/"$model_path_hash"
if [[ -f results/"$model_path_hash"/"$phrase_hash" ]]; then
echo "Embedding was already calculated by previous run"
exit 0
fi
echo "Calculating embedding for phrase: $2"
echo "Phrase: $2" >results/"$model_path_hash"/"$phrase_hash"
./embedding -m $1 -p "$2" >>results/"$model_path_hash"/"$phrase_hash"
And compare_embeddings.py
:
#!/usr/bin/python3
# /* +----------------------------------+ */
# /* | LLaMA Embeddings Tester | */
# /* | compare_embeddings.py | */
# /* | (c)copyright nitram147 2023 | */
# /* +----------------------------------+ */
import sys
import glob
import math
def print_help(script_name: str) -> None:
print("Usage: python3 " + script_name + " path_to_results_folder")
def get_results_subfolders(path_to_results_folder: str) -> list:
return [
x + "/" for x in sorted(glob.glob(path_to_results_folder + "*"))
if glob.os.path.isdir(x)
]
def get_results_filenames_from_folder(folder: str) -> list:
return [
x for x in sorted(glob.glob(folder + "*"))
if glob.os.path.isfile(x) and len(glob.os.path.basename(x)) == 64
]
def load_embedding_from_file(file: str) -> dict:
if not glob.os.path.isfile(file): raise ValueError("Invalid argument provided!!!")
lines = [x.strip("\n") for x in open(file, "r").readlines()]
if not lines[0].startswith("Phrase: "): raise ValueError("Invalid result file provided!!!")
#remove last space character on the end of returned embedding by [:-1]
return { lines[0][len("Phrase: "):] : [float(x) for x in lines[1][:-1].split(" ")] }
def get_distance_between_embeddings(first: list, second: list) -> float:
if (
not isinstance(first, list) or
not isinstance(second, list)
): raise ValueError("Invalid arguments provided!!!")
return math.dist(first, second)
def get_table_index(i: int, j: int, length: int) -> int:
if j < i: i, j = j, i
return sum([length - x for x in range(i)]) + (j - i)
if len(sys.argv) != 2:
print("Invalid count of arguments! See help below:", file=sys.stderr)
print_help(sys.argv[0])
sys.exit(1)
path_to_results_folder = sys.argv[1] + "/" if sys.argv[1][-1] != "/" else sys.argv[1]
results_subfolders = get_results_subfolders(path_to_results_folder)
for folder in results_subfolders:
print("Analyzing data in folder: " + folder)
filenames = get_results_filenames_from_folder(folder)
phrases_embeddings = sorted(
[load_embedding_from_file(file) for file in filenames],
key = lambda v: list(v.keys())[0]
)
phrases_count = len(phrases_embeddings)
distances = []
for i in range(phrases_count):
for j in range(i, phrases_count):
distances.append(
get_distance_between_embeddings(
phrases_embeddings[i][list(phrases_embeddings[i].keys())[0]],
phrases_embeddings[j][list(phrases_embeddings[j].keys())[0]]
)
)
for i in range(phrases_count):
print("Distance from phrase \"" + list(phrases_embeddings[i].keys())[0] + "\" to:")
for j in range(phrases_count):
print(
"\tPhrase: \"" + list(phrases_embeddings[j].keys())[0] + "\" is " +
str(distances[get_table_index(i, j, phrases_count)])
)
For my surprise for the short phrases it does not hold this "phrases with the similar meaning should be closer to each other" premise.
See:
Extract embeddings for a few short phrases:
bash get_embeddings.sh ../LLaMA/7B/ggml-new-model-q4_0.bin "hello"
bash get_embeddings.sh ../LLaMA/7B/ggml-new-model-q4_0.bin "hello "
bash get_embeddings.sh ../LLaMA/7B/ggml-new-model-q4_0.bin "cat"
bash get_embeddings.sh ../LLaMA/7B/ggml-new-model-q4_0.bin "cat "
bash get_embeddings.sh ../LLaMA/7B/ggml-new-model-q4_0.bin "dog"
bash get_embeddings.sh ../LLaMA/7B/ggml-new-model-q4_0.bin "dog "
Obtain results:
python compare_embeddings.py results/
Results:
Analyzing data in folder: results/9a2dfca16ff679dc3442dad4ea8cbbeaf015ef08df76385c789965b97226eb99/
Distance from phrase "cat" to:
Phrase: "cat" is 0.0
Phrase: "cat " is 141.5266910129102
Phrase: "dog" is 79.05358607846175
Phrase: "dog " is 150.61770647155694
Phrase: "hello" is 104.7673500465483
Phrase: "hello " is 147.5524726057386
Distance from phrase "cat " to:
Phrase: "cat" is 141.5266910129102
Phrase: "cat " is 0.0
Phrase: "dog" is 134.02650674575497
Phrase: "dog " is 65.6564442420672
Phrase: "hello" is 152.28321946264828
Phrase: "hello " is 69.05842796227314
Distance from phrase "dog" to:
Phrase: "cat" is 79.05358607846175
Phrase: "cat " is 134.02650674575497
Phrase: "dog" is 0.0
Phrase: "dog " is 134.56114935952093
Phrase: "hello" is 110.13720887542694
Phrase: "hello " is 139.16754187161132
Distance from phrase "dog " to:
Phrase: "cat" is 150.61770647155694
Phrase: "cat " is 65.6564442420672
Phrase: "dog" is 134.56114935952093
Phrase: "dog " is 0.0
Phrase: "hello" is 155.05308475451446
Phrase: "hello " is 60.12117182785281
Distance from phrase "hello" to:
Phrase: "cat" is 104.7673500465483
Phrase: "cat " is 152.28321946264828
Phrase: "dog" is 110.13720887542694
Phrase: "dog " is 155.05308475451446
Phrase: "hello" is 0.0
Phrase: "hello " is 141.5638727632533
Distance from phrase "hello " to:
Phrase: "cat" is 147.5524726057386
Phrase: "cat " is 69.05842796227314
Phrase: "dog" is 139.16754187161132
Phrase: "dog " is 60.12117182785281
Phrase: "hello" is 141.5638727632533
Phrase: "hello " is 0.0
Unfortunately, I don't have any more time at the moment. But if you have, try to extract embeddings for more complicated phrases and post the results here :-)
I ran more tests using cosine similarity, so that it would be easier to comapare to the initial tests.
Some results are as expected:
- "I like cats" is similar to "I love cats" and "cats are cute", and dissimilar to "Napoleonic France"
- "cat" is quite similar to "dog"
- "Napoleonic France" is somewhat similar to "Victorian England"
- "hello" is quite similar to "hi"
However some similarities are way off:
- appending one of the phrases with a space character dramatically reduces the similarity
- if both phrases end with a space character, the similarity comes back up
- "I like cats" is very dissimilar to "cat" and "I like dogs" is very dissimilar to "dog"
@StrikingLoo @ggerganov any intuition why the current embedding calculation logic could be behaving this way?
"I like cats" -- "I like cats "................ 0.20311777255799193
"I like cats" -- "I like dogs"................. 0.896390003690664
"I like cats" -- "I like dogs "................ 0.20045489096743105
"I like cats" -- "I love cats"................. 0.9571038771953083
"I like cats" -- "I love cats "................ 0.2156631142674983
"I like cats" -- "I love dogs"................. 0.8450703589509785
"I like cats" -- "I love dogs "................ 0.2169230548515942
"I like cats" -- "Napoleonic France"........... -0.21246371932212327
"I like cats" -- "Napoleonic France ".......... 0.04575540547715773
"I like cats" -- "Victorian England"........... -0.29933218462361305
"I like cats" -- "Victorian England ".......... -0.06149233717528417
"I like cats" -- "cat"......................... -0.22651239180178487
"I like cats" -- "cat "........................ 0.05906783956749464
"I like cats" -- "cats are cute"............... 0.3670225246784726
"I like cats" -- "cats are cute ".............. 0.11606769194395
"I like cats" -- "dog"......................... -0.14639967519051528
"I like cats" -- "dog "........................ 0.04783762210617664
"I like cats" -- "dogs are cute"............... 0.31819465704480615
"I like cats" -- "dogs are cute ".............. 0.11610797748796792
"I like cats" -- "hello"....................... -0.20630688086162569
"I like cats" -- "hello "...................... 0.05191533662217677
"I like cats" -- "hi".......................... -0.18188225673086578
"I like cats" -- "hi "......................... -0.0595385355447103
"I like cats " -- "I like dogs"................ 0.19392397721812782
"I like cats " -- "I like dogs "............... 0.9601616172820892
"I like cats " -- "I love cats"................ 0.20298700271041506
"I like cats " -- "I love cats "............... 0.9692328566598946
"I like cats " -- "I love dogs"................ 0.18069456493337113
"I like cats " -- "I love dogs "............... 0.9361746123408047
"I like cats " -- "Napoleonic France".......... 0.04077828080003284
"I like cats " -- "Napoleonic France "......... 0.7514104733324016
"I like cats " -- "Victorian England".......... 0.009752570450316756
"I like cats " -- "Victorian England "......... 0.7966698584728275
"I like cats " -- "cat"........................ -0.015622401712858672
"I like cats " -- "cat "....................... 0.7438255953321713
"I like cats " -- "cats are cute".............. 0.20019632673493853
"I like cats " -- "cats are cute "............. 0.870023708294639
"I like cats " -- "dog"........................ 0.0030972791571316615
"I like cats " -- "dog "....................... 0.8017966029865697
"I like cats " -- "dogs are cute".............. 0.18456252662747993
"I like cats " -- "dogs are cute "............. 0.8497227651725612
"I like cats " -- "hello"...................... -0.0005249279792397854
"I like cats " -- "hello "..................... 0.8324597099732179
"I like cats " -- "hi"......................... 0.0012268027593519127
"I like cats " -- "hi "........................ 0.7523755760379622
"I like dogs" -- "I like dogs "................ 0.22689238866131242
"I like dogs" -- "I love cats"................. 0.8745890129079315
"I like dogs" -- "I love cats "................ 0.20704656061606252
"I like dogs" -- "I love dogs"................. 0.9488098708025015
"I like dogs" -- "I love dogs "................ 0.24556722925131885
"I like dogs" -- "Napoleonic France"........... -0.26413464286093585
"I like dogs" -- "Napoleonic France ".......... 0.05801915836818936
"I like dogs" -- "Victorian England"........... -0.3562970344216997
"I like dogs" -- "Victorian England ".......... -0.06291220071485515
"I like dogs" -- "cat"......................... -0.3220193857299431
"I like dogs" -- "cat "........................ 0.01976040801733492
"I like dogs" -- "cats are cute"............... 0.30090905476542995
"I like dogs" -- "cats are cute ".............. 0.08185635301464264
"I like dogs" -- "dog"......................... -0.15754898020924868
"I like dogs" -- "dog "........................ 0.05649268019207619
"I like dogs" -- "dogs are cute"............... 0.25782603756454203
"I like dogs" -- "dogs are cute ".............. 0.0890702719335868
"I like dogs" -- "hello"....................... -0.2796894362421596
"I like dogs" -- "hello "...................... 0.035996981301803635
"I like dogs" -- "hi".......................... -0.25787672908495085
"I like dogs" -- "hi "......................... -0.08290316130522596
"I like dogs " -- "I love cats"................ 0.2045472826446419
"I like dogs " -- "I love cats "............... 0.9167028194335608
"I like dogs " -- "I love dogs"................ 0.2129259955894849
"I like dogs " -- "I love dogs "............... 0.9534364920909392
"I like dogs " -- "Napoleonic France".......... 0.030884468121599513
"I like dogs " -- "Napoleonic France "......... 0.7373470208338967
"I like dogs " -- "Victorian England".......... -0.02431210206116902
"I like dogs " -- "Victorian England "......... 0.7752905016610782
"I like dogs " -- "cat"........................ -0.08397765922811914
"I like dogs " -- "cat "....................... 0.71447935466483
"I like dogs " -- "cats are cute".............. 0.17071387667006183
"I like dogs " -- "cats are cute "............. 0.8151229555939554
"I like dogs " -- "dog"........................ -0.04537135780039387
"I like dogs " -- "dog "....................... 0.8167544600308861
"I like dogs " -- "dogs are cute".............. 0.15822200994259486
"I like dogs " -- "dogs are cute "............. 0.7938602405373409
"I like dogs " -- "hello"...................... -0.05666404826137203
"I like dogs " -- "hello "..................... 0.8289671743241819
"I like dogs " -- "hi"......................... -0.060960899056495974
"I like dogs " -- "hi "........................ 0.7187010548820195
"I love cats" -- "I love cats "................ 0.2448064338260396
"I love cats" -- "I love dogs"................. 0.899362557333871
"I love cats" -- "I love dogs "................ 0.2469770260439035
"I love cats" -- "Napoleonic France"........... -0.2619564319421419
"I love cats" -- "Napoleonic France ".......... 0.04512874304527943
"I love cats" -- "Victorian England"........... -0.3351779492606247
"I love cats" -- "Victorian England ".......... -0.05627744048023769
"I love cats" -- "cat"......................... -0.24381239195695179
"I love cats" -- "cat "........................ 0.05865530689702666
"I love cats" -- "cats are cute"............... 0.3642354902833239
"I love cats" -- "cats are cute ".............. 0.12915733809213054
"I love cats" -- "dog"......................... -0.181630562647824
"I love cats" -- "dog "........................ 0.04991525949175284
"I love cats" -- "dogs are cute"............... 0.31779280347738087
"I love cats" -- "dogs are cute ".............. 0.12914489705580579
"I love cats" -- "hello"....................... -0.20556096184328576
"I love cats" -- "hello "...................... 0.07391973600329921
"I love cats" -- "hi".......................... -0.18424632031868096
"I love cats" -- "hi "......................... -0.05032686070896378
"I love cats " -- "I love dogs"................ 0.2252081785243395
"I love cats " -- "I love dogs "............... 0.9536944077380259
"I love cats " -- "Napoleonic France".......... 0.022966387887623004
"I love cats " -- "Napoleonic France "......... 0.74409242120594
"I love cats " -- "Victorian England".......... 0.005962043345386044
"I love cats " -- "Victorian England "......... 0.781874206851949
"I love cats " -- "cat"........................ -0.0034494665529626427
"I love cats " -- "cat "....................... 0.7317299538195132
"I love cats " -- "cats are cute".............. 0.2262019494531532
"I love cats " -- "cats are cute "............. 0.8769976427038626
"I love cats " -- "dog"........................ 0.02328492758403161
"I love cats " -- "dog "....................... 0.7703433589994425
"I love cats " -- "dogs are cute".............. 0.2104158917188272
"I love cats " -- "dogs are cute "............. 0.8660908021592335
"I love cats " -- "hello"...................... 0.023115252661932466
"I love cats " -- "hello "..................... 0.8086529873575895
"I love cats " -- "hi"......................... 0.023717349902427878
"I love cats " -- "hi "........................ 0.7426429014054192
"I love dogs" -- "I love dogs "................ 0.2668744541065285
"I love dogs" -- "Napoleonic France"........... -0.29275150529306815
"I love dogs" -- "Napoleonic France ".......... 0.04357306838641106
"I love dogs" -- "Victorian England"........... -0.36638799068196853
"I love dogs" -- "Victorian England ".......... -0.06908215968686245
"I love dogs" -- "cat"......................... -0.3047423164022532
"I love dogs" -- "cat "........................ 0.0101104682762854
"I love dogs" -- "cats are cute"............... 0.3039941060157555
"I love dogs" -- "cats are cute ".............. 0.08910464218525402
"I love dogs" -- "dog"......................... -0.15135784328566665
"I love dogs" -- "dog "........................ 0.05290617609381392
"I love dogs" -- "dogs are cute"............... 0.26499805257358044
"I love dogs" -- "dogs are cute ".............. 0.09934014727476749
"I love dogs" -- "hello"....................... -0.24268201717121615
"I love dogs" -- "hello "...................... 0.045935074892588655
"I love dogs" -- "hi".......................... -0.22500111960052072
"I love dogs" -- "hi "......................... -0.07546074189006309
"I love dogs " -- "Napoleonic France".......... 0.01883090481368493
"I love dogs " -- "Napoleonic France "......... 0.7386010682104132
"I love dogs " -- "Victorian England".......... -0.02281812995309553
"I love dogs " -- "Victorian England "......... 0.7603767383707928
"I love dogs " -- "cat"........................ -0.06416890752500873
"I love dogs " -- "cat "....................... 0.7087321235353528
"I love dogs " -- "cats are cute".............. 0.20021670300208802
"I love dogs " -- "cats are cute "............. 0.8293343369105992
"I love dogs " -- "dog"........................ -0.007743482872031577
"I love dogs " -- "dog "....................... 0.791858638404352
"I love dogs " -- "dogs are cute".............. 0.18901810582114495
"I love dogs " -- "dogs are cute "............. 0.8217160711203176
"I love dogs " -- "hello"...................... -0.028063669282785846
"I love dogs " -- "hello "..................... 0.7975007795567103
"I love dogs " -- "hi"......................... -0.02801001638132258
"I love dogs " -- "hi "........................ 0.7116123302355635
"Napoleonic France" -- "Napoleonic France ".... 0.23522390837866922
"Napoleonic France" -- "Victorian England"..... 0.6859025998049194
"Napoleonic France" -- "Victorian England ".... 0.15648560509818651
"Napoleonic France" -- "cat"................... 0.35800033036759454
"Napoleonic France" -- "cat ".................. 0.10647011838283668
"Napoleonic France" -- "cats are cute"......... 0.07981987732132663
"Napoleonic France" -- "cats are cute "........ 0.078149911960321
"Napoleonic France" -- "dog"................... 0.3826710214412356
"Napoleonic France" -- "dog ".................. 0.11401018637067296
"Napoleonic France" -- "dogs are cute"......... 0.0773770554340013
"Napoleonic France" -- "dogs are cute "........ 0.09123545209030627
"Napoleonic France" -- "hello"................. 0.37213096418783836
"Napoleonic France" -- "hello "................ 0.057774352193263975
"Napoleonic France" -- "hi".................... 0.3507834273848848
"Napoleonic France" -- "hi "................... 0.17696122118133434
"Napoleonic France " -- "Victorian England".... 0.08466607680324116
"Napoleonic France " -- "Victorian England "... 0.8037786302246899
"Napoleonic France " -- "cat".................. -0.019977595529280315
"Napoleonic France " -- "cat "................. 0.7037017986232446
"Napoleonic France " -- "cats are cute"........ 0.07337913536494711
"Napoleonic France " -- "cats are cute "....... 0.6771872359838416
"Napoleonic France " -- "dog".................. 0.010643016302043572
"Napoleonic France " -- "dog "................. 0.739274480331095
"Napoleonic France " -- "dogs are cute"........ 0.04549129074053724
"Napoleonic France " -- "dogs are cute "....... 0.6471315374932367
"Napoleonic France " -- "hello"................ -0.04491316315316086
"Napoleonic France " -- "hello "............... 0.7016026239194642
"Napoleonic France " -- "hi"................... -0.04483742943994349
"Napoleonic France " -- "hi ".................. 0.6379276120297552
"Victorian England" -- "Victorian England ".... 0.1970397243022337
"Victorian England" -- "cat"................... 0.5315626991866473
"Victorian England" -- "cat ".................. 0.1132440438361098
"Victorian England" -- "cats are cute"......... 0.07564712547170802
"Victorian England" -- "cats are cute "........ 0.07047236143056597
"Victorian England" -- "dog"................... 0.5023841250096192
"Victorian England" -- "dog ".................. 0.09627092477400122
"Victorian England" -- "dogs are cute"......... 0.08558379851546237
"Victorian England" -- "dogs are cute "........ 0.0892397072219102
"Victorian England" -- "hello"................. 0.5153410825616703
"Victorian England" -- "hello "................ 0.04956935673613258
"Victorian England" -- "hi".................... 0.4727394855738129
"Victorian England" -- "hi "................... 0.21165691018559324
"Victorian England " -- "cat".................. 0.051434725296773336
"Victorian England " -- "cat "................. 0.7840190374817173
"Victorian England " -- "cats are cute"........ 0.0647773051440868
"Victorian England " -- "cats are cute "....... 0.7347889675972376
"Victorian England " -- "dog".................. 0.03781358609513832
"Victorian England " -- "dog "................. 0.8064848839781267
"Victorian England " -- "dogs are cute"........ 0.04396328283710094
"Victorian England " -- "dogs are cute "....... 0.6829677641565312
"Victorian England " -- "hello"................ 0.042481552565901706
"Victorian England " -- "hello "............... 0.808211277301756
"Victorian England " -- "hi"................... 0.028057424386802313
"Victorian England " -- "hi ".................. 0.745340058660383
"cat" -- "cat "................................ 0.2261562422600992
"cat" -- "cats are cute"....................... 0.055479073463416025
"cat" -- "cats are cute "...................... 0.042783194474326644
"cat" -- "dog"................................. 0.7428052216162652
"cat" -- "dog "................................ 0.07579947107525319
"cat" -- "dogs are cute"....................... 0.08587503622015415
"cat" -- "dogs are cute "...................... 0.06047225271094304
"cat" -- "hello"............................... 0.5867101415982408
"cat" -- "hello ".............................. 0.020849676027916392
"cat" -- "hi".................................. 0.5395565382469979
"cat" -- "hi "................................. 0.18922445718289724
"cat " -- "cats are cute"...................... 0.1377352530456209
"cat " -- "cats are cute "..................... 0.7045457312726324
"cat " -- "dog"................................ 0.151244663186442
"cat " -- "dog "............................... 0.8130943607206529
"cat " -- "dogs are cute"...................... 0.10501837198032893
"cat " -- "dogs are cute "..................... 0.6591719081389649
"cat " -- "hello".............................. 0.05720266958632396
"cat " -- "hello "............................. 0.7487726233664361
"cat " -- "hi"................................. 0.04989013326368915
"cat " -- "hi "................................ 0.7145388756690138
"cats are cute" -- "cats are cute "............ 0.3062561991124481
"cats are cute" -- "dog"....................... 0.12454416235558191
"cats are cute" -- "dog "...................... 0.13800195126360817
"cats are cute" -- "dogs are cute"............. 0.9635889317154503
"cats are cute" -- "dogs are cute "............ 0.3340841814158837
"cats are cute" -- "hello"..................... 0.19514219546396794
"cats are cute" -- "hello ".................... 0.15336479785550297
"cats are cute" -- "hi"........................ 0.19398964147538308
"cats are cute" -- "hi "....................... 0.15299873070429496
"cats are cute " -- "dog"...................... 0.047162606186251725
"cats are cute " -- "dog "..................... 0.7412502506668067
"cats are cute " -- "dogs are cute"............ 0.2922316428497054
"cats are cute " -- "dogs are cute "........... 0.9694282308713165
"cats are cute " -- "hello".................... 0.07172682701676675
"cats are cute " -- "hello "................... 0.7659055442905317
"cats are cute " -- "hi"....................... 0.07395981795315063
"cats are cute " -- "hi "...................... 0.7292468669861907
"dog" -- "dog "................................ 0.15684172716063982
"dog" -- "dogs are cute"....................... 0.15841792840219776
"dog" -- "dogs are cute "...................... 0.07408198913521334
"dog" -- "hello"............................... 0.5390216946824529
"dog" -- "hello ".............................. 0.004148038650315292
"dog" -- "hi".................................. 0.4729935607550871
"dog" -- "hi "................................. 0.1594872209549724
"dog " -- "dogs are cute"...................... 0.11997556990898295
"dog " -- "dogs are cute "..................... 0.700678277520648
"dog " -- "hello".............................. 0.0252694300933951
"dog " -- "hello "............................. 0.8277359075315665
"dog " -- "hi"................................. 0.005638887786058549
"dog " -- "hi "................................ 0.7370474015559675
"dogs are cute" -- "dogs are cute "............ 0.3421108849843522
"dogs are cute" -- "hello"..................... 0.2194993676080879
"dogs are cute" -- "hello ".................... 0.1326256045415808
"dogs are cute" -- "hi"........................ 0.21773509413279815
"dogs are cute" -- "hi "....................... 0.1500779129341232
"dogs are cute " -- "hello".................... 0.1031251333293599
"dogs are cute " -- "hello "................... 0.7279175778496194
"dogs are cute " -- "hi"....................... 0.11635693884531505
"dogs are cute " -- "hi "...................... 0.7269622577344995
"hello" -- "hello "............................ 0.15147937239963533
"hello" -- "hi"................................ 0.8043211555390358
"hello" -- "hi "............................... 0.2946474076607263
"hello " -- "hi"............................... 0.10360399054770437
"hello " -- "hi ".............................. 0.8464744965225194
"hi" -- "hi ".................................. 0.373960595217887
I don't see these results as particularly unexpected.
A sentence that ends in a ' ' is inherently incomplete (it would be missing a word, etc) so it's not weird that the model encodes it very differently than a complete one, though this is just my interpretation. As a recommendation I would advise any real applications using these embeddings strip trailing whitespace off input text, especially if it's user input.
As for the "I like cats" vs "cats" similarity, I also don't see it as particularly unexpected that they are not similar, as one is a sentence and the other a single word, and they only share part of the topic. I would be more surprised if two noun clauses (like "hairy feline" and "purring kitten") that have similar meanings were assigned very different scores.
Basically things that are syntactically dissimilar are understandably not very close in embedding space.
If you test sentences with very similar syntax and somewhat similar semantics and they are not aligned at all, that would worry me more.
I hope this clarifies things! Anyone who knows more please chime in too.
On Fri, Apr 21, 2023, 02:13 Rimvydas Naktinis @.***> wrote:
I ran more tests using cosine similarity, so that it would be easier to comapare to the initial tests https://github.com/ggerganov/llama.cpp/pull/282#issuecomment-1479895785.
Some results are as expected:
- "I like cats" is similar to "I love cats" and "cats are cute", and dissimilar to "Napoleonic France"
- "cat" is quite similar to "dog"
- "Napoleonic France" is somewhat similar to "Victorian England"
- "hello" is quite similar to "hi"
However some similarities are way off:
- appending one of the phrases with a space character dramatically reduces the similarity
- if both phrases end with a space character, the similarity comes back up
- "I like cats" is very dissimilar to "cat" and "I like dogs" is very dissimilar to "dog"
@StrikingLoo https://github.com/StrikingLoo @ggerganov https://github.com/ggerganov any intuition why the current embedding calculation logic could be behaving this way?
"I like cats" -- "I like cats "................ 0.20311777255799193 "I like cats" -- "I like dogs"................. 0.896390003690664 "I like cats" -- "I like dogs "................ 0.20045489096743105 "I like cats" -- "I love cats"................. 0.9571038771953083 "I like cats" -- "I love cats "................ 0.2156631142674983 "I like cats" -- "I love dogs"................. 0.8450703589509785 "I like cats" -- "I love dogs "................ 0.2169230548515942 "I like cats" -- "Napoleonic France"........... -0.21246371932212327 "I like cats" -- "Napoleonic France ".......... 0.04575540547715773 "I like cats" -- "Victorian England"........... -0.29933218462361305 "I like cats" -- "Victorian England ".......... -0.06149233717528417 "I like cats" -- "cat"......................... -0.22651239180178487 "I like cats" -- "cat "........................ 0.05906783956749464 "I like cats" -- "cats are cute"............... 0.3670225246784726 "I like cats" -- "cats are cute ".............. 0.11606769194395 "I like cats" -- "dog"......................... -0.14639967519051528 "I like cats" -- "dog "........................ 0.04783762210617664 "I like cats" -- "dogs are cute"............... 0.31819465704480615 "I like cats" -- "dogs are cute ".............. 0.11610797748796792 "I like cats" -- "hello"....................... -0.20630688086162569 "I like cats" -- "hello "...................... 0.05191533662217677 "I like cats" -- "hi".......................... -0.18188225673086578 "I like cats" -- "hi "......................... -0.0595385355447103 "I like cats " -- "I like dogs"................ 0.19392397721812782 "I like cats " -- "I like dogs "............... 0.9601616172820892 "I like cats " -- "I love cats"................ 0.20298700271041506 "I like cats " -- "I love cats "............... 0.9692328566598946 "I like cats " -- "I love dogs"................ 0.18069456493337113 "I like cats " -- "I love dogs "............... 0.9361746123408047 "I like cats " -- "Napoleonic France".......... 0.04077828080003284 "I like cats " -- "Napoleonic France "......... 0.7514104733324016 "I like cats " -- "Victorian England".......... 0.009752570450316756 "I like cats " -- "Victorian England "......... 0.7966698584728275 "I like cats " -- "cat"........................ -0.015622401712858672 "I like cats " -- "cat "....................... 0.7438255953321713 "I like cats " -- "cats are cute".............. 0.20019632673493853 "I like cats " -- "cats are cute "............. 0.870023708294639 "I like cats " -- "dog"........................ 0.0030972791571316615 "I like cats " -- "dog "....................... 0.8017966029865697 "I like cats " -- "dogs are cute".............. 0.18456252662747993 "I like cats " -- "dogs are cute "............. 0.8497227651725612 "I like cats " -- "hello"...................... -0.0005249279792397854 "I like cats " -- "hello "..................... 0.8324597099732179 "I like cats " -- "hi"......................... 0.0012268027593519127 "I like cats " -- "hi "........................ 0.7523755760379622 "I like dogs" -- "I like dogs "................ 0.22689238866131242 "I like dogs" -- "I love cats"................. 0.8745890129079315 "I like dogs" -- "I love cats "................ 0.20704656061606252 "I like dogs" -- "I love dogs"................. 0.9488098708025015 "I like dogs" -- "I love dogs "................ 0.24556722925131885 "I like dogs" -- "Napoleonic France"........... -0.26413464286093585 "I like dogs" -- "Napoleonic France ".......... 0.05801915836818936 "I like dogs" -- "Victorian England"........... -0.3562970344216997 "I like dogs" -- "Victorian England ".......... -0.06291220071485515 "I like dogs" -- "cat"......................... -0.3220193857299431 "I like dogs" -- "cat "........................ 0.01976040801733492 "I like dogs" -- "cats are cute"............... 0.30090905476542995 "I like dogs" -- "cats are cute ".............. 0.08185635301464264 "I like dogs" -- "dog"......................... -0.15754898020924868 "I like dogs" -- "dog "........................ 0.05649268019207619 "I like dogs" -- "dogs are cute"............... 0.25782603756454203 "I like dogs" -- "dogs are cute ".............. 0.0890702719335868 "I like dogs" -- "hello"....................... -0.2796894362421596 "I like dogs" -- "hello "...................... 0.035996981301803635 "I like dogs" -- "hi".......................... -0.25787672908495085 "I like dogs" -- "hi "......................... -0.08290316130522596 "I like dogs " -- "I love cats"................ 0.2045472826446419 "I like dogs " -- "I love cats "............... 0.9167028194335608 "I like dogs " -- "I love dogs"................ 0.2129259955894849 "I like dogs " -- "I love dogs "............... 0.9534364920909392 "I like dogs " -- "Napoleonic France".......... 0.030884468121599513 "I like dogs " -- "Napoleonic France "......... 0.7373470208338967 "I like dogs " -- "Victorian England".......... -0.02431210206116902 "I like dogs " -- "Victorian England "......... 0.7752905016610782 "I like dogs " -- "cat"........................ -0.08397765922811914 "I like dogs " -- "cat "....................... 0.71447935466483 "I like dogs " -- "cats are cute".............. 0.17071387667006183 "I like dogs " -- "cats are cute "............. 0.8151229555939554 "I like dogs " -- "dog"........................ -0.04537135780039387 "I like dogs " -- "dog "....................... 0.8167544600308861 "I like dogs " -- "dogs are cute".............. 0.15822200994259486 "I like dogs " -- "dogs are cute "............. 0.7938602405373409 "I like dogs " -- "hello"...................... -0.05666404826137203 "I like dogs " -- "hello "..................... 0.8289671743241819 "I like dogs " -- "hi"......................... -0.060960899056495974 "I like dogs " -- "hi "........................ 0.7187010548820195 "I love cats" -- "I love cats "................ 0.2448064338260396 "I love cats" -- "I love dogs"................. 0.899362557333871 "I love cats" -- "I love dogs "................ 0.2469770260439035 "I love cats" -- "Napoleonic France"........... -0.2619564319421419 "I love cats" -- "Napoleonic France ".......... 0.04512874304527943 "I love cats" -- "Victorian England"........... -0.3351779492606247 "I love cats" -- "Victorian England ".......... -0.05627744048023769 "I love cats" -- "cat"......................... -0.24381239195695179 "I love cats" -- "cat "........................ 0.05865530689702666 "I love cats" -- "cats are cute"............... 0.3642354902833239 "I love cats" -- "cats are cute ".............. 0.12915733809213054 "I love cats" -- "dog"......................... -0.181630562647824 "I love cats" -- "dog "........................ 0.04991525949175284 "I love cats" -- "dogs are cute"............... 0.31779280347738087 "I love cats" -- "dogs are cute ".............. 0.12914489705580579 "I love cats" -- "hello"....................... -0.20556096184328576 "I love cats" -- "hello "...................... 0.07391973600329921 "I love cats" -- "hi".......................... -0.18424632031868096 "I love cats" -- "hi "......................... -0.05032686070896378 "I love cats " -- "I love dogs"................ 0.2252081785243395 "I love cats " -- "I love dogs "............... 0.9536944077380259 "I love cats " -- "Napoleonic France".......... 0.022966387887623004 "I love cats " -- "Napoleonic France "......... 0.74409242120594 "I love cats " -- "Victorian England".......... 0.005962043345386044 "I love cats " -- "Victorian England "......... 0.781874206851949 "I love cats " -- "cat"........................ -0.0034494665529626427 "I love cats " -- "cat "....................... 0.7317299538195132 "I love cats " -- "cats are cute".............. 0.2262019494531532 "I love cats " -- "cats are cute "............. 0.8769976427038626 "I love cats " -- "dog"........................ 0.02328492758403161 "I love cats " -- "dog "....................... 0.7703433589994425 "I love cats " -- "dogs are cute".............. 0.2104158917188272 "I love cats " -- "dogs are cute "............. 0.8660908021592335 "I love cats " -- "hello"...................... 0.023115252661932466 "I love cats " -- "hello "..................... 0.8086529873575895 "I love cats " -- "hi"......................... 0.023717349902427878 "I love cats " -- "hi "........................ 0.7426429014054192 "I love dogs" -- "I love dogs "................ 0.2668744541065285 "I love dogs" -- "Napoleonic France"........... -0.29275150529306815 "I love dogs" -- "Napoleonic France ".......... 0.04357306838641106 "I love dogs" -- "Victorian England"........... -0.36638799068196853 "I love dogs" -- "Victorian England ".......... -0.06908215968686245 "I love dogs" -- "cat"......................... -0.3047423164022532 "I love dogs" -- "cat "........................ 0.0101104682762854 "I love dogs" -- "cats are cute"............... 0.3039941060157555 "I love dogs" -- "cats are cute ".............. 0.08910464218525402 "I love dogs" -- "dog"......................... -0.15135784328566665 "I love dogs" -- "dog "........................ 0.05290617609381392 "I love dogs" -- "dogs are cute"............... 0.26499805257358044 "I love dogs" -- "dogs are cute ".............. 0.09934014727476749 "I love dogs" -- "hello"....................... -0.24268201717121615 "I love dogs" -- "hello "...................... 0.045935074892588655 "I love dogs" -- "hi".......................... -0.22500111960052072 "I love dogs" -- "hi "......................... -0.07546074189006309 "I love dogs " -- "Napoleonic France".......... 0.01883090481368493 "I love dogs " -- "Napoleonic France "......... 0.7386010682104132 "I love dogs " -- "Victorian England".......... -0.02281812995309553 "I love dogs " -- "Victorian England "......... 0.7603767383707928 "I love dogs " -- "cat"........................ -0.06416890752500873 "I love dogs " -- "cat "....................... 0.7087321235353528 "I love dogs " -- "cats are cute".............. 0.20021670300208802 "I love dogs " -- "cats are cute "............. 0.8293343369105992 "I love dogs " -- "dog"........................ -0.007743482872031577 "I love dogs " -- "dog "....................... 0.791858638404352 "I love dogs " -- "dogs are cute".............. 0.18901810582114495 "I love dogs " -- "dogs are cute "............. 0.8217160711203176 "I love dogs " -- "hello"...................... -0.028063669282785846 "I love dogs " -- "hello "..................... 0.7975007795567103 "I love dogs " -- "hi"......................... -0.02801001638132258 "I love dogs " -- "hi "........................ 0.7116123302355635 "Napoleonic France" -- "Napoleonic France ".... 0.23522390837866922 "Napoleonic France" -- "Victorian England"..... 0.6859025998049194 "Napoleonic France" -- "Victorian England ".... 0.15648560509818651 "Napoleonic France" -- "cat"................... 0.35800033036759454 "Napoleonic France" -- "cat ".................. 0.10647011838283668 "Napoleonic France" -- "cats are cute"......... 0.07981987732132663 "Napoleonic France" -- "cats are cute "........ 0.078149911960321 "Napoleonic France" -- "dog"................... 0.3826710214412356 "Napoleonic France" -- "dog ".................. 0.11401018637067296 "Napoleonic France" -- "dogs are cute"......... 0.0773770554340013 "Napoleonic France" -- "dogs are cute "........ 0.09123545209030627 "Napoleonic France" -- "hello"................. 0.37213096418783836 "Napoleonic France" -- "hello "................ 0.057774352193263975 "Napoleonic France" -- "hi".................... 0.3507834273848848 "Napoleonic France" -- "hi "................... 0.17696122118133434 "Napoleonic France " -- "Victorian England".... 0.08466607680324116 "Napoleonic France " -- "Victorian England "... 0.8037786302246899 "Napoleonic France " -- "cat".................. -0.019977595529280315 "Napoleonic France " -- "cat "................. 0.7037017986232446 "Napoleonic France " -- "cats are cute"........ 0.07337913536494711 "Napoleonic France " -- "cats are cute "....... 0.6771872359838416 "Napoleonic France " -- "dog".................. 0.010643016302043572 "Napoleonic France " -- "dog "................. 0.739274480331095 "Napoleonic France " -- "dogs are cute"........ 0.04549129074053724 "Napoleonic France " -- "dogs are cute "....... 0.6471315374932367 "Napoleonic France " -- "hello"................ -0.04491316315316086 "Napoleonic France " -- "hello "............... 0.7016026239194642 "Napoleonic France " -- "hi"................... -0.04483742943994349 "Napoleonic France " -- "hi ".................. 0.6379276120297552 "Victorian England" -- "Victorian England ".... 0.1970397243022337 "Victorian England" -- "cat"................... 0.5315626991866473 "Victorian England" -- "cat ".................. 0.1132440438361098 "Victorian England" -- "cats are cute"......... 0.07564712547170802 "Victorian England" -- "cats are cute "........ 0.07047236143056597 "Victorian England" -- "dog"................... 0.5023841250096192 "Victorian England" -- "dog ".................. 0.09627092477400122 "Victorian England" -- "dogs are cute"......... 0.08558379851546237 "Victorian England" -- "dogs are cute "........ 0.0892397072219102 "Victorian England" -- "hello"................. 0.5153410825616703 "Victorian England" -- "hello "................ 0.04956935673613258 "Victorian England" -- "hi".................... 0.4727394855738129 "Victorian England" -- "hi "................... 0.21165691018559324 "Victorian England " -- "cat".................. 0.051434725296773336 "Victorian England " -- "cat "................. 0.7840190374817173 "Victorian England " -- "cats are cute"........ 0.0647773051440868 "Victorian England " -- "cats are cute "....... 0.7347889675972376 "Victorian England " -- "dog".................. 0.03781358609513832 "Victorian England " -- "dog "................. 0.8064848839781267 "Victorian England " -- "dogs are cute"........ 0.04396328283710094 "Victorian England " -- "dogs are cute "....... 0.6829677641565312 "Victorian England " -- "hello"................ 0.042481552565901706 "Victorian England " -- "hello "............... 0.808211277301756 "Victorian England " -- "hi"................... 0.028057424386802313 "Victorian England " -- "hi ".................. 0.745340058660383 "cat" -- "cat "................................ 0.2261562422600992 "cat" -- "cats are cute"....................... 0.055479073463416025 "cat" -- "cats are cute "...................... 0.042783194474326644 "cat" -- "dog"................................. 0.7428052216162652 "cat" -- "dog "................................ 0.07579947107525319 "cat" -- "dogs are cute"....................... 0.08587503622015415 "cat" -- "dogs are cute "...................... 0.06047225271094304 "cat" -- "hello"............................... 0.5867101415982408 "cat" -- "hello ".............................. 0.020849676027916392 "cat" -- "hi".................................. 0.5395565382469979 "cat" -- "hi "................................. 0.18922445718289724 "cat " -- "cats are cute"...................... 0.1377352530456209 "cat " -- "cats are cute "..................... 0.7045457312726324 "cat " -- "dog"................................ 0.151244663186442 "cat " -- "dog "............................... 0.8130943607206529 "cat " -- "dogs are cute"...................... 0.10501837198032893 "cat " -- "dogs are cute "..................... 0.6591719081389649 "cat " -- "hello".............................. 0.05720266958632396 "cat " -- "hello "............................. 0.7487726233664361 "cat " -- "hi"................................. 0.04989013326368915 "cat " -- "hi "................................ 0.7145388756690138 "cats are cute" -- "cats are cute "............ 0.3062561991124481 "cats are cute" -- "dog"....................... 0.12454416235558191 "cats are cute" -- "dog "...................... 0.13800195126360817 "cats are cute" -- "dogs are cute"............. 0.9635889317154503 "cats are cute" -- "dogs are cute "............ 0.3340841814158837 "cats are cute" -- "hello"..................... 0.19514219546396794 "cats are cute" -- "hello ".................... 0.15336479785550297 "cats are cute" -- "hi"........................ 0.19398964147538308 "cats are cute" -- "hi "....................... 0.15299873070429496 "cats are cute " -- "dog"...................... 0.047162606186251725 "cats are cute " -- "dog "..................... 0.7412502506668067 "cats are cute " -- "dogs are cute"............ 0.2922316428497054 "cats are cute " -- "dogs are cute "........... 0.9694282308713165 "cats are cute " -- "hello".................... 0.07172682701676675 "cats are cute " -- "hello "................... 0.7659055442905317 "cats are cute " -- "hi"....................... 0.07395981795315063 "cats are cute " -- "hi "...................... 0.7292468669861907 "dog" -- "dog "................................ 0.15684172716063982 "dog" -- "dogs are cute"....................... 0.15841792840219776 "dog" -- "dogs are cute "...................... 0.07408198913521334 "dog" -- "hello"............................... 0.5390216946824529 "dog" -- "hello ".............................. 0.004148038650315292 "dog" -- "hi".................................. 0.4729935607550871 "dog" -- "hi "................................. 0.1594872209549724 "dog " -- "dogs are cute"...................... 0.11997556990898295 "dog " -- "dogs are cute "..................... 0.700678277520648 "dog " -- "hello".............................. 0.0252694300933951 "dog " -- "hello "............................. 0.8277359075315665 "dog " -- "hi"................................. 0.005638887786058549 "dog " -- "hi "................................ 0.7370474015559675 "dogs are cute" -- "dogs are cute "............ 0.3421108849843522 "dogs are cute" -- "hello"..................... 0.2194993676080879 "dogs are cute" -- "hello ".................... 0.1326256045415808 "dogs are cute" -- "hi"........................ 0.21773509413279815 "dogs are cute" -- "hi "....................... 0.1500779129341232 "dogs are cute " -- "hello".................... 0.1031251333293599 "dogs are cute " -- "hello "................... 0.7279175778496194 "dogs are cute " -- "hi"....................... 0.11635693884531505 "dogs are cute " -- "hi "...................... 0.7269622577344995 "hello" -- "hello "............................ 0.15147937239963533 "hello" -- "hi"................................ 0.8043211555390358 "hello" -- "hi "............................... 0.2946474076607263 "hello " -- "hi"............................... 0.10360399054770437 "hello " -- "hi ".............................. 0.8464744965225194 "hi" -- "hi ".................................. 0.373960595217887
— Reply to this email directly, view it on GitHub https://github.com/ggerganov/llama.cpp/issues/899#issuecomment-1516677687, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD6JCX57AZXUQPMBVZNQXLTXCFVCRANCNFSM6AAAAAAW2SDSGU . You are receiving this because you were mentioned.Message ID: @.***>
I'm not even sure what the embedding vector is supposed to be that llama.h gives you, I think it may represent the next generated token more than anything because it's extracted at the end.
Yes, I also tried myself. my similarity search based on this llama embedding doesn't work at all. It finds content that is far away from the query.
Switching to a different embedding system solved my issue.
Also, does the tokenizer tokenize spaces? I thought "hello" and "hello " should be the same if tokenized?
I think it mostly has the space comes in the beginning of the word, that's also why main.cpp inserts a space in the beginning.
Yes, I also tried myself. my similarity search based on this llama embedding doesn't work at all. It finds content that is far away from the query.
Switching to a different embedding system solved my issue.
Also, does the tokenizer tokenize spaces? I thought "hello" and "hello " should be the same if tokenized?
Are you using a non-llama model for generating embeddings and doing the search, or did you find a way to do it with Llama?
Yes, I also tried myself. my similarity search based on this llama embedding doesn't work at all. It finds content that is far away from the query.
Switching to a different embedding system solved my issue.
Also, does the tokenizer tokenize spaces? I thought "hello" and "hello " should be the same if tokenized?
instead of 7B, have you tried with bigger llama model?
Is the embedding value is not correct?
I think it may represent the next generated token more than anything because it's extracted at the end.
I tried reading on basics of transformers at https://www.baeldung.com/cs/transformer-text-embeddings and near the end they say:
If we want a vector representing each token, we can just use the corresponding output vector produced by the encoding stack block (The “y” vectors in the diagram above)
If we need a vector representing the whole sequence, there are 3 strategies we can follow:
- Use the [CLS] token output vector (I believe this is what we are doing now?)
- Apply mean pooling between the token vectors
- Apply max-pooling between the token vectors
The default strategy is the first one, though some papers suggest the other two work best for other tasks.
The last link says that "The construction of BERT makes it unsuitable for semantic similarity search as well as for unsupervised tasks like clustering." The authors present a different network structure, that can actually generate sentence embeddings.
So, it would seem that the keyword to google is "sentence embedding with LLM".
Googling that, there is a SO question noticing that OpenAI embeddings don't seem to work much for short inputs either: https://datascience.stackexchange.com/questions/120422/text-embeddings-for-words-or-very-short-sentences-with-a-llm
@tkafka So the embedding using llama is correct then ? PrivateGPT is generating embedding on its own, it use this method ?
Exactly, we are taking the vector corresponding to the last token, which should have the information of the whole sentence. That is option 1 in @tkafka's comment. At least that's what we wanted to do.
I think more inputs could be tested, but in general it was working pretty well. I wonder which search terms vs corpus are not matching in your use-case? It could be interesting to develop a search capability and add it to this project (or an open-source spin-off).
Also maybe a zero-shot prompt could work, though a lot slower. Something along the lines of
You are a great librarian, expert in retrieving relevant documents given a topic or query. I will provide the keyword I am looking for, and a list of titles. You will provide the top 3 titles most likely to be related to my keyword, in descending order of relevancy.
<keyword>$KEYWORD</keyword>
<titles>
- Title 1
- Title 2...
</titles>
Top 3 results:
@x4080 That depends on the intended use - for example, for document comparison and similarity search, I would definitely prefer mean (or even more probably max) pooling.
Here is what GPT says about the methods:
Now, let's say we want to represent the entire input text (not just individual tokens) as a single vector. One way to do this is through "mean pooling". This involves calculating the mean (or average) of all the token vectors. In simpler terms, you're finding the "middle point" of all your token vectors. This pooled vector then represents the entire sentence, taking into account all the individual words.
...
Max pooling is another technique for reducing a set of vectors into a single vector, much like mean pooling. However, instead of taking the average of the vectors, max pooling takes the maximum value for each dimension across all vectors.
(as an aside, for indexing a large base of documents, I would definitely welcome a webserver-like mode, that would load the model once, and then accept requests with documents, returning the embeddings - currently each run loads the model again)
for document comparison and similarity search, I would definitely prefer mean (or even more probably max) pooling
In this case are the transformer-generated representations the ones we are pooling, or would it be the word-embeddings? I'm leaning towards the first option but wanted to make extra sure.
A different approach to this would be using some sort of attention measure. Something like, for each element in our corpus, taking the convex sum of its embeddings scaled by their cosine similarity with our keywords. I'm not advocating for this particularly crude attention, but a better thought-out approach.
Something I think would be good is taking an already established search dataset and using it as a benchmark, then we could iterate over the crudest possible search (cosine similarity of last token embedding) and find improvements.
I assume some good dataset exists in open source.
@StrikingLoo Not sure actually - I have been using LLMs like a 'magical black boxes' so far, and am reading up on the basics. The word embeddings are definitely problematic, as the google researchers replied (for the BERT embeddings):
I'm not sure what these vectors are, since BERT does not generate meaningful sentence vectors. It seems that this is is doing average pooling over the word tokens to get a sentence vector, but we never suggested that this will generate meaningful sentence representations. And even if they are decent representations when fed into a DNN trained for a downstream task, it doesn't mean that they will be meaningful in terms of cosine distance. (Since cosine distance is a linear space where all dimensions are weighted equally).
and also
Bert is a language model, was never really meant for sentence similarity tasks, you can try fine-tuned bert model for sentence similarity and use it as a sentence encoder if you have clean, decently long sentences.
https://github.com/google-research/bert/issues/164#issuecomment-441324222
I am beginning to lean onto the idea that what llama does now is actually 'least bad'option out of the easily available ones, and there seems to be active research still going on about how to best semantically embed sentences or documents ...
More of the interesting discussions (from BERT)
The USE is a whole another approach and I do agree simply averaging may not be the best way especially with contextualized embeddings. I am working on introducing other pooling strategies for BERT to average the last 4 layers instead of just having 1 layer at the time, and also extend the SentenceEmbeddings to do more such as weighted-average, including TF-IDF as a weight factor, and SIF (Smooth Inverse Frequency).
https://github.com/JohnSnowLabs/spark-nlp/issues/684#issuecomment-557897665
Llama is unidirectional, not bidirectional like BERT, which I think may make the embeddings better but not sure. I agree that this is a 'least-bad' approach, not sure how we could improve it.
I leveraged the script by @nitram147 and switched it to use cosine similarity, and output the results ranked by similarity instead of randomly.
I see one-word queries are similar to each other in embedding space, even if the words are not that related. This will definitely be bad for search. Maybe for one-word search it would be better to use word-embedding similarity over the document (with max pooling, or highlighting of sections with high similarity), instead of the full language model.
Then for sentences we could switch to the full llama sentence embedding.
Again this is a least-bad approach, but it could work better than what we have now for search. If anyone has the time to do it.
Here are the results I got, plus the script (which is a modified version of Nitram's)
# /* +----------------------------------+ */
# /* | LLaMA Embeddings Tester | */
# /* | compare_embeddings.py | */
# /* | (c)copyright nitram147 2023 | */
# /* +----------------------------------+ */
import sys
import glob
import math
from numpy import dot
from numpy.linalg import norm
def cos_sim(a,b):
return dot(a, b)/(norm(a)*norm(b))
def print_help(script_name: str) -> None:
print("Usage: python3 " + script_name + " path_to_results_folder")
def get_results_subfolders(path_to_results_folder: str) -> list:
return [
x + "/" for x in sorted(glob.glob(path_to_results_folder + "*"))
if glob.os.path.isdir(x)
]
def get_results_filenames_from_folder(folder: str) -> list:
return [
x for x in sorted(glob.glob(folder + "*"))
if glob.os.path.isfile(x) and len(glob.os.path.basename(x)) == 64
]
def load_embedding_from_file(file: str) -> dict:
if not glob.os.path.isfile(file): raise ValueError("Invalid argument provided!!!")
lines = [x.strip("\n") for x in open(file, "r").readlines()]
if not lines[0].startswith("Phrase: "): raise ValueError("Invalid result file provided!!!")
#remove last space character on the end of returned embedding by [:-1]
return { lines[0][len("Phrase: "):] : [float(x) for x in lines[1][:-1].split(" ")] }
def get_distance_between_embeddings(first: list, second: list) -> float:
if (
not isinstance(first, list) or
not isinstance(second, list)
): raise ValueError("Invalid arguments provided!!!")
return cos_sim(first, second)
def get_table_index(i: int, j: int, length: int) -> int:
if j < i: i, j = j, i
return sum([length - x for x in range(i)]) + (j - i)
if len(sys.argv) != 2:
print("Invalid count of arguments! See help below:", file=sys.stderr)
print_help(sys.argv[0])
sys.exit(1)
path_to_results_folder = sys.argv[1] + "/" if sys.argv[1][-1] != "/" else sys.argv[1]
results_subfolders = get_results_subfolders(path_to_results_folder)
for folder in results_subfolders:
print("Analyzing data in folder: " + folder)
filenames = get_results_filenames_from_folder(folder)
phrases_embeddings = sorted(
[load_embedding_from_file(file) for file in filenames],
key = lambda v: list(v.keys())[0]
)
phrases_count = len(phrases_embeddings)
distances = [{} for i in range(phrases_count)]
for i in range(phrases_count):
for j in range(i, phrases_count):
distances[i][j] = get_distance_between_embeddings(
phrases_embeddings[i][list(phrases_embeddings[i].keys())[0]],
phrases_embeddings[j][list(phrases_embeddings[j].keys())[0]]
)
distances[j][i] = distances[i][j]
for i in range(phrases_count):
print("Distance from phrase \"" + list(phrases_embeddings[i].keys())[0] + "\" to:")
sorted_similarities = sorted(distances[i].items(), key=lambda x:x[1])
sorted_similarities.reverse()
for j, v in sorted_similarities:
print(
"\tPhrase: \"" + list(phrases_embeddings[j].keys())[0] + "\" is " +
str(distances[i][j])
)
And here are the results. I think especially sentence vs sentence, they make sense. The biggest problem is one-word queries (which I guess are a big portion of all search queries). Maybe a good search would be grep-first, word-embedding second, sentence embedding third? This sounds like the kind of problem where someone smarter than me has already invented solutions though.
Distance from phrase "A detailed history of the United Kingdom" to:
Phrase: "A detailed history of the United Kingdom" is 0.9999999999999998
Phrase: "The Roman Republic" is 0.5748738236620277
Phrase: "Victorian England" is 0.5417959966805156
Phrase: "Five serpents" is 0.4887654290282026
Phrase: "An essay about vipers" is 0.4877461656062162
Phrase: "The temple of the snakes" is 0.4713469048137312
Phrase: "A treaty on vipers" is 0.4325516723710721
Phrase: "birds" is 0.4323984354416718
Phrase: "Wales" is 0.40871608168029117
Phrase: "history" is 0.40729360565060807
Phrase: "snakes" is 0.37678648423072164
Phrase: "Five kittens" is 0.3504049324479399
Phrase: "Platypus are animals in the family of monotrema" is 0.3416645712014458
Phrase: "Five puppies" is 0.3399316180379291
Phrase: "Important facts about snakes" is 0.1901999384625233
Phrase: "Most birds can fly, but not all of them" is 0.14662966175951292
Phrase: "Ostriches lay eggs in summer" is 0.05645862730366624
Phrase: "Platypus are animals in the family of monotremes" is 0.029071853233189747
Phrase: "Platypus lay eggs even though they are mammals" is 0.001354915506041539
Phrase: "They are taking the hobbits to Isengard" is -0.027998082203763545
Distance from phrase "A treaty on vipers" to:
Phrase: "A treaty on vipers" is 0.9999999999999999
Phrase: "The temple of the snakes" is 0.7121266584730245
Phrase: "An essay about vipers" is 0.6756580098457795
Phrase: "Five serpents" is 0.6138246984223019
Phrase: "Important facts about snakes" is 0.6033703821537613
Phrase: "Platypus are animals in the family of monotrema" is 0.5213079576838386
Phrase: "Five kittens" is 0.5102517425776194
Phrase: "The Roman Republic" is 0.4952710353955269
Phrase: "Five puppies" is 0.4928782444742322
Phrase: "A detailed history of the United Kingdom" is 0.4325516723710721
Phrase: "Most birds can fly, but not all of them" is 0.41196624821292727
Phrase: "birds" is 0.40048799630259635
Phrase: "Ostriches lay eggs in summer" is 0.36774319666470007
Phrase: "Victorian England" is 0.34452522313252903
Phrase: "snakes" is 0.34030332986162143
Phrase: "Platypus are animals in the family of monotremes" is 0.33867262088481903
Phrase: "They are taking the hobbits to Isengard" is 0.3017421545504743
Phrase: "Platypus lay eggs even though they are mammals" is 0.28052927838170594
Phrase: "Wales" is 0.2589836768050556
Phrase: "history" is 0.16800477409088788
Distance from phrase "An essay about vipers" to:
Phrase: "An essay about vipers" is 1.0
Phrase: "A treaty on vipers" is 0.6756580098457795
Phrase: "The temple of the snakes" is 0.6613045304976577
Phrase: "snakes" is 0.6424852768571448
Phrase: "birds" is 0.6022809720677789
Phrase: "Five serpents" is 0.5706639924969026
Phrase: "Victorian England" is 0.5004794370047532
Phrase: "A detailed history of the United Kingdom" is 0.4877461656062162
Phrase: "The Roman Republic" is 0.48582124817882233
Phrase: "history" is 0.48050394501994287
Phrase: "Five kittens" is 0.3972282359943179
Phrase: "Five puppies" is 0.36455730502806755
Phrase: "Wales" is 0.28967521135603924
Phrase: "Platypus are animals in the family of monotrema" is 0.2885147834071458
Phrase: "Important facts about snakes" is 0.2686247993307239
Phrase: "Most birds can fly, but not all of them" is 0.1516970934076252
Phrase: "Ostriches lay eggs in summer" is -0.06161211798781555
Phrase: "Platypus are animals in the family of monotremes" is -0.076016728541744
Phrase: "They are taking the hobbits to Isengard" is -0.12011592297443835
Phrase: "Platypus lay eggs even though they are mammals" is -0.14184739358299675
Distance from phrase "Five kittens" to:
Phrase: "Five kittens" is 0.9999999999999999
Phrase: "Five puppies" is 0.9637579941524779
Phrase: "Five serpents" is 0.7181583032099871
Phrase: "The temple of the snakes" is 0.5154477753455162
Phrase: "A treaty on vipers" is 0.5102517425776194
Phrase: "Most birds can fly, but not all of them" is 0.46804329258518235
Phrase: "birds" is 0.4329793179326728
Phrase: "The Roman Republic" is 0.42026387242460195
Phrase: "An essay about vipers" is 0.3972282359943179
Phrase: "Victorian England" is 0.37867964473037746
Phrase: "Platypus are animals in the family of monotrema" is 0.35483120275980085
Phrase: "A detailed history of the United Kingdom" is 0.3504049324479399
Phrase: "snakes" is 0.3384167555590293
Phrase: "Important facts about snakes" is 0.3051616952272665
Phrase: "Wales" is 0.2869469938895282
Phrase: "Ostriches lay eggs in summer" is 0.19663443328163918
Phrase: "They are taking the hobbits to Isengard" is 0.18155184584899411
Phrase: "Platypus are animals in the family of monotremes" is 0.17553070413530059
Phrase: "Platypus lay eggs even though they are mammals" is 0.167219877876806
Phrase: "history" is 0.15289122649586898
Distance from phrase "Five puppies" to:
Phrase: "Five puppies" is 1.0
Phrase: "Five kittens" is 0.9637579941524779
Phrase: "Five serpents" is 0.6937265653941465
Phrase: "A treaty on vipers" is 0.4928782444742322
Phrase: "The temple of the snakes" is 0.47766281627400026
Phrase: "Most birds can fly, but not all of them" is 0.4670481719172508
Phrase: "birds" is 0.39218471142299705
Phrase: "The Roman Republic" is 0.3849102243088202
Phrase: "An essay about vipers" is 0.36455730502806755
Phrase: "Platypus are animals in the family of monotrema" is 0.3508676788029156
Phrase: "A detailed history of the United Kingdom" is 0.3399316180379291
Phrase: "Victorian England" is 0.3346593189159687
Phrase: "Important facts about snakes" is 0.324087568892877
Phrase: "snakes" is 0.29776784877896306
Phrase: "Wales" is 0.2937135879933727
Phrase: "Ostriches lay eggs in summer" is 0.20993457335944407
Phrase: "Platypus are animals in the family of monotremes" is 0.20618581263720262
Phrase: "Platypus lay eggs even though they are mammals" is 0.19523476831744402
Phrase: "They are taking the hobbits to Isengard" is 0.18998793822468293
Phrase: "history" is 0.11647268985383732
Distance from phrase "Five serpents" to:
Phrase: "Five serpents" is 0.9999999999999999
Phrase: "Five kittens" is 0.7181583032099871
Phrase: "The temple of the snakes" is 0.7126311879933475
Phrase: "Five puppies" is 0.6937265653941465
Phrase: "A treaty on vipers" is 0.6138246984223019
Phrase: "An essay about vipers" is 0.5706639924969026
Phrase: "birds" is 0.5586839567810735
Phrase: "snakes" is 0.5350123489286813
Phrase: "The Roman Republic" is 0.5316128408395847
Phrase: "A detailed history of the United Kingdom" is 0.4887654290282026
Phrase: "Victorian England" is 0.4880012983858247
Phrase: "Most birds can fly, but not all of them" is 0.42510155926378385
Phrase: "Platypus are animals in the family of monotrema" is 0.39039890813324973
Phrase: "history" is 0.31638335399535994
Phrase: "Wales" is 0.31536804519798795
Phrase: "Important facts about snakes" is 0.2730802410943006
Phrase: "Ostriches lay eggs in summer" is 0.1045134201465086
Phrase: "Platypus are animals in the family of monotremes" is 0.09111205097352691
Phrase: "Platypus lay eggs even though they are mammals" is 0.06015343973108685
Phrase: "They are taking the hobbits to Isengard" is 0.04756306022740093
Distance from phrase "Important facts about snakes" to:
Phrase: "Important facts about snakes" is 1.0
Phrase: "Platypus are animals in the family of monotremes" is 0.6395528238368102
Phrase: "Platypus lay eggs even though they are mammals" is 0.625774654703554
Phrase: "A treaty on vipers" is 0.6033703821537613
Phrase: "Ostriches lay eggs in summer" is 0.6015410729979876
Phrase: "They are taking the hobbits to Isengard" is 0.5416844489131567
Phrase: "Platypus are animals in the family of monotrema" is 0.4993521072165484
Phrase: "Most birds can fly, but not all of them" is 0.40654780779615896
Phrase: "The temple of the snakes" is 0.33485603875398423
Phrase: "Five puppies" is 0.324087568892877
Phrase: "Five kittens" is 0.3051616952272665
Phrase: "Five serpents" is 0.2730802410943006
Phrase: "An essay about vipers" is 0.2686247993307239
Phrase: "A detailed history of the United Kingdom" is 0.1901999384625233
Phrase: "Wales" is 0.136449263197007
Phrase: "The Roman Republic" is 0.1351638751322845
Phrase: "birds" is 0.023255450047180805
Phrase: "snakes" is -0.016928320998117207
Phrase: "Victorian England" is -0.06392578654090138
Phrase: "history" is -0.21211808043470504
Distance from phrase "Most birds can fly, but not all of them" to:
Phrase: "Most birds can fly, but not all of them" is 1.0
Phrase: "Five kittens" is 0.46804329258518235
Phrase: "Five puppies" is 0.4670481719172508
Phrase: "Ostriches lay eggs in summer" is 0.45467751192524924
Phrase: "Platypus lay eggs even though they are mammals" is 0.4390938473607746
Phrase: "Five serpents" is 0.42510155926378385
Phrase: "Platypus are animals in the family of monotremes" is 0.4145665249360098
Phrase: "A treaty on vipers" is 0.41196624821292727
Phrase: "Important facts about snakes" is 0.40654780779615896
Phrase: "They are taking the hobbits to Isengard" is 0.4050178579766894
Phrase: "Platypus are animals in the family of monotrema" is 0.3749648352266313
Phrase: "The temple of the snakes" is 0.2779855332150474
Phrase: "birds" is 0.18000835274267407
Phrase: "The Roman Republic" is 0.17215546204568155
Phrase: "An essay about vipers" is 0.1516970934076252
Phrase: "A detailed history of the United Kingdom" is 0.14662966175951292
Phrase: "Wales" is 0.14601303174046434
Phrase: "snakes" is 0.04878154241021203
Phrase: "Victorian England" is 0.04768871027452527
Phrase: "history" is -0.1289748521765876
Distance from phrase "Ostriches lay eggs in summer" to:
Phrase: "Ostriches lay eggs in summer" is 1.0
Phrase: "Platypus lay eggs even though they are mammals" is 0.7562461703550528
Phrase: "Platypus are animals in the family of monotremes" is 0.7252187218973202
Phrase: "They are taking the hobbits to Isengard" is 0.6813851613339252
Phrase: "Important facts about snakes" is 0.6015410729979876
Phrase: "Platypus are animals in the family of monotrema" is 0.45904739610934026
Phrase: "Most birds can fly, but not all of them" is 0.45467751192524924
Phrase: "A treaty on vipers" is 0.36774319666470007
Phrase: "Five puppies" is 0.20993457335944407
Phrase: "Five kittens" is 0.19663443328163918
Phrase: "Five serpents" is 0.1045134201465086
Phrase: "A detailed history of the United Kingdom" is 0.05645862730366624
Phrase: "The temple of the snakes" is 0.039955601536770254
Phrase: "Wales" is 0.025899447347126417
Phrase: "The Roman Republic" is -0.03612939508681027
Phrase: "An essay about vipers" is -0.06161211798781555
Phrase: "birds" is -0.19049183012392876
Phrase: "Victorian England" is -0.23449337096332104
Phrase: "snakes" is -0.31028059984901213
Phrase: "history" is -0.3809109227202052
Distance from phrase "Platypus are animals in the family of monotrema" to:
Phrase: "Platypus are animals in the family of monotrema" is 0.9999999999999999
Phrase: "Platypus are animals in the family of monotremes" is 0.620178529962473
Phrase: "A treaty on vipers" is 0.5213079576838386
Phrase: "Important facts about snakes" is 0.4993521072165484
Phrase: "Platypus lay eggs even though they are mammals" is 0.467344439291505
Phrase: "Ostriches lay eggs in summer" is 0.45904739610934026
Phrase: "The temple of the snakes" is 0.3980306499066072
Phrase: "Five serpents" is 0.39039890813324973
Phrase: "Most birds can fly, but not all of them" is 0.3749648352266313
Phrase: "Five kittens" is 0.35483120275980085
Phrase: "Five puppies" is 0.3508676788029156
Phrase: "A detailed history of the United Kingdom" is 0.3416645712014458
Phrase: "They are taking the hobbits to Isengard" is 0.34149434376099636
Phrase: "The Roman Republic" is 0.3213479902058374
Phrase: "An essay about vipers" is 0.2885147834071458
Phrase: "birds" is 0.20666068517374298
Phrase: "Wales" is 0.14583454573554305
Phrase: "snakes" is 0.1190190772294065
Phrase: "Victorian England" is 0.11566740137646295
Phrase: "history" is 0.010699338004517623
Distance from phrase "Platypus are animals in the family of monotremes" to:
Phrase: "Platypus are animals in the family of monotremes" is 1.0000000000000002
Phrase: "Platypus lay eggs even though they are mammals" is 0.8198717952715795
Phrase: "Ostriches lay eggs in summer" is 0.7252187218973202
Phrase: "Important facts about snakes" is 0.6395528238368102
Phrase: "They are taking the hobbits to Isengard" is 0.6230846072081213
Phrase: "Platypus are animals in the family of monotrema" is 0.620178529962473
Phrase: "Most birds can fly, but not all of them" is 0.4145665249360098
Phrase: "A treaty on vipers" is 0.33867262088481903
Phrase: "Five puppies" is 0.20618581263720262
Phrase: "Five kittens" is 0.17553070413530059
Phrase: "Five serpents" is 0.09111205097352691
Phrase: "The temple of the snakes" is 0.058959841122550406
Phrase: "A detailed history of the United Kingdom" is 0.029071853233189747
Phrase: "Wales" is -0.021033555063295344
Phrase: "The Roman Republic" is -0.029984235066861313
Phrase: "An essay about vipers" is -0.076016728541744
Phrase: "birds" is -0.23020738177073827
Phrase: "Victorian England" is -0.26167436042116127
Phrase: "snakes" is -0.3266629465844782
Phrase: "history" is -0.415516791839634
Distance from phrase "Platypus lay eggs even though they are mammals" to:
Phrase: "Platypus lay eggs even though they are mammals" is 1.0
Phrase: "Platypus are animals in the family of monotremes" is 0.8198717952715795
Phrase: "Ostriches lay eggs in summer" is 0.7562461703550528
Phrase: "They are taking the hobbits to Isengard" is 0.7204261321828527
Phrase: "Important facts about snakes" is 0.625774654703554
Phrase: "Platypus are animals in the family of monotrema" is 0.467344439291505
Phrase: "Most birds can fly, but not all of them" is 0.4390938473607746
Phrase: "A treaty on vipers" is 0.28052927838170594
Phrase: "Five puppies" is 0.19523476831744402
Phrase: "Five kittens" is 0.167219877876806
Phrase: "Five serpents" is 0.06015343973108685
Phrase: "Wales" is 0.011540422166169335
Phrase: "A detailed history of the United Kingdom" is 0.001354915506041539
Phrase: "The temple of the snakes" is -0.024654326682285847
Phrase: "The Roman Republic" is -0.08910031561314674
Phrase: "An essay about vipers" is -0.14184739358299675
Phrase: "birds" is -0.23253452301526378
Phrase: "Victorian England" is -0.29892930149207947
Phrase: "snakes" is -0.3368052783296547
Phrase: "history" is -0.4389313409760877
Distance from phrase "The Roman Republic" to:
Phrase: "The Roman Republic" is 1.0
Phrase: "Victorian England" is 0.7841146200572829
Phrase: "The temple of the snakes" is 0.6364758764502186
Phrase: "A detailed history of the United Kingdom" is 0.5748738236620277
Phrase: "Five serpents" is 0.5316128408395847
Phrase: "A treaty on vipers" is 0.4952710353955269
Phrase: "An essay about vipers" is 0.48582124817882233
Phrase: "history" is 0.48231789906021916
Phrase: "birds" is 0.4819111670905733
Phrase: "snakes" is 0.46379561792198076
Phrase: "Five kittens" is 0.42026387242460195
Phrase: "Five puppies" is 0.3849102243088202
Phrase: "Wales" is 0.382858313247597
Phrase: "Platypus are animals in the family of monotrema" is 0.3213479902058374
Phrase: "Most birds can fly, but not all of them" is 0.17215546204568155
Phrase: "Important facts about snakes" is 0.1351638751322845
Phrase: "Platypus are animals in the family of monotremes" is -0.029984235066861313
Phrase: "Ostriches lay eggs in summer" is -0.03612939508681027
Phrase: "They are taking the hobbits to Isengard" is -0.06534568617570666
Phrase: "Platypus lay eggs even though they are mammals" is -0.08910031561314674
Distance from phrase "The temple of the snakes" to:
Phrase: "The temple of the snakes" is 1.0
Phrase: "Five serpents" is 0.7126311879933475
Phrase: "A treaty on vipers" is 0.7121266584730245
Phrase: "An essay about vipers" is 0.6613045304976577
Phrase: "The Roman Republic" is 0.6364758764502186
Phrase: "Victorian England" is 0.5970962177231259
Phrase: "snakes" is 0.5808420416999638
Phrase: "birds" is 0.5573900343844128
Phrase: "Five kittens" is 0.5154477753455162
Phrase: "Five puppies" is 0.47766281627400026
Phrase: "A detailed history of the United Kingdom" is 0.4713469048137312
Phrase: "history" is 0.4141767926445685
Phrase: "Platypus are animals in the family of monotrema" is 0.3980306499066072
Phrase: "Important facts about snakes" is 0.33485603875398423
Phrase: "Wales" is 0.3120181652150862
Phrase: "Most birds can fly, but not all of them" is 0.2779855332150474
Phrase: "Platypus are animals in the family of monotremes" is 0.058959841122550406
Phrase: "Ostriches lay eggs in summer" is 0.039955601536770254
Phrase: "They are taking the hobbits to Isengard" is 0.0197709798888127
Phrase: "Platypus lay eggs even though they are mammals" is -0.024654326682285847
Distance from phrase "They are taking the hobbits to Isengard" to:
Phrase: "They are taking the hobbits to Isengard" is 1.0
Phrase: "Platypus lay eggs even though they are mammals" is 0.7204261321828527
Phrase: "Ostriches lay eggs in summer" is 0.6813851613339252
Phrase: "Platypus are animals in the family of monotremes" is 0.6230846072081213
Phrase: "Important facts about snakes" is 0.5416844489131567
Phrase: "Most birds can fly, but not all of them" is 0.4050178579766894
Phrase: "Platypus are animals in the family of monotrema" is 0.34149434376099636
Phrase: "A treaty on vipers" is 0.3017421545504743
Phrase: "Five puppies" is 0.18998793822468293
Phrase: "Five kittens" is 0.18155184584899411
Phrase: "Five serpents" is 0.04756306022740093
Phrase: "The temple of the snakes" is 0.0197709798888127
Phrase: "Wales" is -0.021034520811148583
Phrase: "A detailed history of the United Kingdom" is -0.027998082203763545
Phrase: "The Roman Republic" is -0.06534568617570666
Phrase: "An essay about vipers" is -0.12011592297443835
Phrase: "birds" is -0.2410456062453273
Phrase: "Victorian England" is -0.2419816604582041
Phrase: "snakes" is -0.336432211624184
Phrase: "history" is -0.42161126557001105
Distance from phrase "Victorian England" to:
Phrase: "Victorian England" is 1.0
Phrase: "The Roman Republic" is 0.7841146200572829
Phrase: "history" is 0.6171528620942585
Phrase: "The temple of the snakes" is 0.5970962177231259
Phrase: "snakes" is 0.5820718403103681
Phrase: "birds" is 0.5686519312918882
Phrase: "A detailed history of the United Kingdom" is 0.5417959966805156
Phrase: "An essay about vipers" is 0.5004794370047532
Phrase: "Five serpents" is 0.4880012983858247
Phrase: "Wales" is 0.4149343272780173
Phrase: "Five kittens" is 0.37867964473037746
Phrase: "A treaty on vipers" is 0.34452522313252903
Phrase: "Five puppies" is 0.3346593189159687
Phrase: "Platypus are animals in the family of monotrema" is 0.11566740137646295
Phrase: "Most birds can fly, but not all of them" is 0.04768871027452527
Phrase: "Important facts about snakes" is -0.06392578654090138
Phrase: "Ostriches lay eggs in summer" is -0.23449337096332104
Phrase: "They are taking the hobbits to Isengard" is -0.2419816604582041
Phrase: "Platypus are animals in the family of monotremes" is -0.26167436042116127
Phrase: "Platypus lay eggs even though they are mammals" is -0.29892930149207947
Distance from phrase "Wales" to:
Phrase: "Wales" is 1.0
Phrase: "Victorian England" is 0.4149343272780173
Phrase: "A detailed history of the United Kingdom" is 0.40871608168029117
Phrase: "birds" is 0.3843363535443526
Phrase: "The Roman Republic" is 0.382858313247597
Phrase: "snakes" is 0.35357851660519546
Phrase: "history" is 0.3337169177569299
Phrase: "Five serpents" is 0.31536804519798795
Phrase: "The temple of the snakes" is 0.3120181652150862
Phrase: "Five puppies" is 0.2937135879933727
Phrase: "An essay about vipers" is 0.28967521135603924
Phrase: "Five kittens" is 0.2869469938895282
Phrase: "A treaty on vipers" is 0.2589836768050556
Phrase: "Most birds can fly, but not all of them" is 0.14601303174046434
Phrase: "Platypus are animals in the family of monotrema" is 0.14583454573554305
Phrase: "Important facts about snakes" is 0.136449263197007
Phrase: "Ostriches lay eggs in summer" is 0.025899447347126417
Phrase: "Platypus lay eggs even though they are mammals" is 0.011540422166169335
Phrase: "Platypus are animals in the family of monotremes" is -0.021033555063295344
Phrase: "They are taking the hobbits to Isengard" is -0.021034520811148583
Distance from phrase "birds" to:
Phrase: "birds" is 1.0
Phrase: "snakes" is 0.8154332796941849
Phrase: "history" is 0.6878440441219348
Phrase: "An essay about vipers" is 0.6022809720677789
Phrase: "Victorian England" is 0.5686519312918882
Phrase: "Five serpents" is 0.5586839567810735
Phrase: "The temple of the snakes" is 0.5573900343844128
Phrase: "The Roman Republic" is 0.4819111670905733
Phrase: "Five kittens" is 0.4329793179326728
Phrase: "A detailed history of the United Kingdom" is 0.4323984354416718
Phrase: "A treaty on vipers" is 0.40048799630259635
Phrase: "Five puppies" is 0.39218471142299705
Phrase: "Wales" is 0.3843363535443526
Phrase: "Platypus are animals in the family of monotrema" is 0.20666068517374298
Phrase: "Most birds can fly, but not all of them" is 0.18000835274267407
Phrase: "Important facts about snakes" is 0.023255450047180805
Phrase: "Ostriches lay eggs in summer" is -0.19049183012392876
Phrase: "Platypus are animals in the family of monotremes" is -0.23020738177073827
Phrase: "Platypus lay eggs even though they are mammals" is -0.23253452301526378
Phrase: "They are taking the hobbits to Isengard" is -0.2410456062453273
Distance from phrase "history" to:
Phrase: "history" is 1.0
Phrase: "snakes" is 0.7294531838087251
Phrase: "birds" is 0.6878440441219348
Phrase: "Victorian England" is 0.6171528620942585
Phrase: "The Roman Republic" is 0.48231789906021916
Phrase: "An essay about vipers" is 0.48050394501994287
Phrase: "The temple of the snakes" is 0.4141767926445685
Phrase: "A detailed history of the United Kingdom" is 0.40729360565060807
Phrase: "Wales" is 0.3337169177569299
Phrase: "Five serpents" is 0.31638335399535994
Phrase: "A treaty on vipers" is 0.16800477409088788
Phrase: "Five kittens" is 0.15289122649586898
Phrase: "Five puppies" is 0.11647268985383732
Phrase: "Platypus are animals in the family of monotrema" is 0.010699338004517623
Phrase: "Most birds can fly, but not all of them" is -0.1289748521765876
Phrase: "Important facts about snakes" is -0.21211808043470504
Phrase: "Ostriches lay eggs in summer" is -0.3809109227202052
Phrase: "Platypus are animals in the family of monotremes" is -0.415516791839634
Phrase: "They are taking the hobbits to Isengard" is -0.42161126557001105
Phrase: "Platypus lay eggs even though they are mammals" is -0.4389313409760877
Distance from phrase "snakes" to:
Phrase: "snakes" is 1.0
Phrase: "birds" is 0.8154332796941849
Phrase: "history" is 0.7294531838087251
Phrase: "An essay about vipers" is 0.6424852768571448
Phrase: "Victorian England" is 0.5820718403103681
Phrase: "The temple of the snakes" is 0.5808420416999638
Phrase: "Five serpents" is 0.5350123489286813
Phrase: "The Roman Republic" is 0.46379561792198076
Phrase: "A detailed history of the United Kingdom" is 0.37678648423072164
Phrase: "Wales" is 0.35357851660519546
Phrase: "A treaty on vipers" is 0.34030332986162143
Phrase: "Five kittens" is 0.3384167555590293
Phrase: "Five puppies" is 0.29776784877896306
Phrase: "Platypus are animals in the family of monotrema" is 0.1190190772294065
Phrase: "Most birds can fly, but not all of them" is 0.04878154241021203
Phrase: "Important facts about snakes" is -0.016928320998117207
Phrase: "Ostriches lay eggs in summer" is -0.31028059984901213
Phrase: "Platypus are animals in the family of monotremes" is -0.3266629465844782
Phrase: "They are taking the hobbits to Isengard" is -0.336432211624184
Phrase: "Platypus lay eggs even though they are mammals" is -0.3368052783296547
Here is an example of search using only word embeddings. I think this may work better than sentence embeddings in most cases. We could implement this using the same word embeddings as Llama is using.
I think the output embedding is associated with current predication of next token.
memcpy(embedding_out.data(), (float *) ggml_get_data(embeddings) + (n_embd*(N - 1)), sizeof(float)*n_embd);
https://github.com/ggerganov/llama.cpp/blob/fa84c4b3e80199a5683438f062009c031a06c4fa/llama.cpp#LL1655C6-L1655C6
I did some experiments on this embedding the other day and tested using averaging the vectors.
How: change the embedding vector to be [n_embd * n_ctx]
in size, from the llama.h API return the average embedding of the so far evaluated contexts.
It seemed to be doing a little better in some of the document retrieval tasks. There is still the issue that it is kind of slow even with GPU acceleration to process a lot of text. Maybe all the layers are not necessary to process?
I think maybe LLaMA is not the right model for this task, some kind of encoder-decoder model could be better.
I don't know whether it is relevant here but the llama.cpp's server endpoint '/embedding' doesn't seem to work at all.
./embedding works though.
The response I get is a 4096 length long 0.0 for llama 2 model.
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
@akarshanbiswas, the server needs to be started with the --embedding
option, since it adds some overhead to processing it is disabled by default.
when i use langchain libs to create vectorstor(eg:faiss), it cost me long time(seems 20s) to create with embedding server api. how can i speed up the api ? llm is llama7b merged with alapaca-lora in int8
It is processing the whole content as if it were an input to generate something. So anything you can do to speed up llama.cpp prompt eval will help you.
- GPU will massively increase speed.
- Use a smaller model, like OpenLLaMA 3B, but note that the output vectors will be different.
- Use a smaller context window.
- Use another embedding model that is better suited for this task.
@SlyEcho Thanks
It is processing the whole content as if it were an input to generate something. So anything you can do to speed up llama.cpp prompt eval will help you.
- GPU will massively increase speed.
- Use a smaller model, like OpenLLaMA 3B, but note that the output vectors will be different.
- Use a smaller context window.
- Use another embedding model that is better suited for this task.
i get it,i change the smaller model to embed,it really speed up,tks
I'm not even sure what the embedding vector is supposed to be that llama.h gives you, I think it may represent the next generated token more than anything because it's extracted at the end.
I concur