Unprofessional Behavior and slander
At this point, there is no way to not address your recent behavior at least in some form.
In the following PR, you wrote the following:
Although I still find neither speed advantage nor quality advantage for this, based on my highest love to Forge users, I spend 20 minutes to add those code, and I hope this is the last time that people randomly invent new state dict key formats to waste time of other developers.
You have also decided to name one of the variables to store the mapping "wierd_t5_format_from_city96".
This tells us two things:
- That you for some reason believe that that mapping format was created specifically to not work in forge, and believe that I have something against you and your software.
- That you have not bothered to look at the llama.cpp repository, which is what allows us to use the quantization formats used in both the ComfyUI-GGUF and stable-diffusion-webui-forge repositories. Because if you did, you would have noticed that it was the format specified here and is the key mapping that matches all other architectures supported by the upstream repository. This format has been established well before either your fork of stable-diffusion-webui or the custom node for ComfyUI was created. That quantized T5 model was created using an unmodified copy of llama.cpp since it supports T5Encoder models natively.
I will not search through commit messages to find more instances of this, though I'm sure there must be others that have not been brought to my attention by others. I have seen at least one instance of you mocking one of my trivial code changes to the upstream llama.cpp repository's gguf-py package as "written by chatgpt".
It's sad to see someone completely ignore the core principles of open source, which is to allow people to collaborate and create the best possible software together. I have created my repository under the Apache2 license (the most permissive as far as I am aware), yet you go out of your way to add improvements which you proudly add "Copyright Forge 2024" to in order to make sure we can never use them upstream.
I do not care for a public apology or the sort, nor do I care about "drama". I especially do NOT want credit for the hard work of others. All I want is for you to reconsider your behavior and consider allowing the wider open source community to actually use your changes by using a permissive license so all users can have the best experience possible, no matter what frontend they chose.
"If I have seen further it is by standing on the shoulders of Giants"
Perhaps this is a start towards a more constructive dialog. Illyasviel (just like yourself) puts a lot of spare time and energy into developing this for others and many of us enjoy the fruits of your hard work. There's a lot of bugs to be squashed and reviewed. If you have a day job and a life outside of coding, I can see where this can get frustrating. I'm not sure how attacking him here also helps, but I also understand your perspective as well.
Hi City96, your works are great and I think you are being too sensitive and thinking too much about these.
I am not sure why you see some mocking and slander, but when I comment “written by chatgpt”, it means my codes are really written by chatgpt. This comment has nothing to do with your codes. Screenshots:
Prompt: "Below is unfinished codes for dequantize GGUF tensors:
quick_split = lambda x, p: torch.split(x, p + [x.shape[1] - sum(p)], dim=1)
class Q5_K(__Quant, qtype=GGMLQuantizationType.Q5_K):
@classmethod
def dequantize_blocks(cls, blocks: np.ndarray) -> np.ndarray:
n_blocks = blocks.shape[0]
d, rest = np.hsplit(blocks, [2])
dmin, rest = np.hsplit(rest, [2])
scales, rest = np.hsplit(rest, [Q4_K.K_SCALE_SIZE])
qh, qs = np.hsplit(rest, [QK_K // 8])
d = d.view(np.float16).astype(np.float32)
dmin = dmin.view(np.float16).astype(np.float32)
sc, m = Q4_K.get_scale_min(scales)
d = (d * sc.astype(np.float32)).reshape((n_blocks, -1, 1))
dm = (dmin * m.astype(np.float32)).reshape((n_blocks, -1, 1))
ql = qs.reshape((n_blocks, -1, 1, 32)) >> np.array([0, 4], dtype=np.uint8).reshape((1, 1, 2, 1))
qh = qh.reshape((n_blocks, -1, 1, 32)) >> np.array([i for i in range(8)], dtype=np.uint8).reshape((1, 1, 8, 1))
ql = (ql & np.uint8(0x0F)).reshape((n_blocks, -1, 32))
qh = (qh & np.uint8(0x01)).reshape((n_blocks, -1, 32))
q = (ql | (qh << np.uint8(4))).astype(np.float32)
return (d * q - dm).reshape((n_blocks, QK_K))
@classmethod
def dequantize_blocks_pytorch(cls, blocks, block_size, type_size, parent) -> torch.Tensor:
QK_K = 256
K_SCALE_SIZE = 12
n_blocks = blocks.shape[0]
d, dmin, scales, qh, qs = quick_split(blocks, [2, 2, K_SCALE_SIZE, QK_K // 8])
d = d.view(torch.float16).to(cls.computation_dtype)
dmin = dmin.view(torch.float16).to(cls.computation_dtype)
sc, m = Q4_K.get_scale_min_pytorch(scales)
d = (d * sc).reshape((n_blocks, -1, 1))
dm = (dmin * m).reshape((n_blocks, -1, 1))
ql = qs.reshape((n_blocks, -1, 1, 32)) >> torch.tensor([0, 4], device=d.device, dtype=torch.uint8).reshape((1, 1, 2, 1))
qh = qh.reshape((n_blocks, -1, 1, 32)) >> torch.tensor([i for i in range(8)], device=d.device, dtype=torch.uint8).reshape((1, 1, 8, 1))
ql = (ql & 0x0F).reshape((n_blocks, -1, 32))
qh = (qh & 0x01).reshape((n_blocks, -1, 32))
q = (ql | (qh << 4))
return (d * q - dm).reshape((n_blocks, QK_K))
class Q6_K(__Quant, qtype=GGMLQuantizationType.Q6_K):
@classmethod
def dequantize_blocks(cls, blocks: np.ndarray) -> np.ndarray:
n_blocks = blocks.shape[0]
ql, rest = np.hsplit(blocks, [QK_K // 2])
qh, rest = np.hsplit(rest, [QK_K // 4])
scales, d = np.hsplit(rest, [QK_K // 16])
scales = scales.view(np.int8).astype(np.float32)
d = d.view(np.float16).astype(np.float32)
d = (d * scales).reshape((n_blocks, QK_K // 16, 1))
ql = ql.reshape((n_blocks, -1, 1, 64)) >> np.array([0, 4], dtype=np.uint8).reshape((1, 1, 2, 1))
ql = (ql & np.uint8(0x0F)).reshape((n_blocks, -1, 32))
qh = qh.reshape((n_blocks, -1, 1, 32)) >> np.array([0, 2, 4, 6], dtype=np.uint8).reshape((1, 1, 4, 1))
qh = (qh & np.uint8(0x03)).reshape((n_blocks, -1, 32))
q = (ql | (qh << np.uint8(4))).astype(np.int8) - np.int8(32)
q = q.reshape((n_blocks, QK_K // 16, -1)).astype(np.float32)
return (d * q).reshape((n_blocks, QK_K))
@classmethod
def dequantize_blocks_pytorch(cls, blocks, block_size, type_size, parent) -> torch.Tensor:
# TODO
Finish TODO, and do not add any comments, make sure to use "quick_split""
Response:
And, okay, I am somewhat familiar with llama.cpp codes in fact. I even have an accelerated version of GGUF based on torch cpp extension using similar logic to huggingface/optimum-quanto like https://github.com/huggingface/optimum-quanto/blob/main/optimum/quanto/library/extensions/cuda/unpack.cu which is about 35% faster than current pytorch workarounds. But installing it on Win10 is a pain and needs Visual Studio 2019 and weird Ninja so I did not release it and put it on hold internally.
About format, you format not only need other key dicts but also requires partially dequantize a part of state dict before load, this is special enough to add a mark so that future people/maintainer know how. “weird_” is a popular and very common prefix for special things that need attention in codes.
Also “Copyright Forge” and your “(c) City96” are same thing. Your “c” here means Copyright. (Forge is AGPL because webui is AGPL.
Guy on the sidelines here.
It seems to me that much was misinterpreted by @city96
Good reply from you, you likely educated a lot of people in the process.
This is definitely rude, though:
to waste time of other developers.
Everyone in this scene is working for the sake of the users, so in this case you should really just be upset that the users embraced what the developer did. I don't think devs are deliberately throwing monkey wrenches into each others projects.
About format, you format not only need other key dicts but also requires partially dequantize a part of state dict before load, this is special enough to add a mark so that future people/maintainer know how.
This is not true, you can implement torch.nn.functional.embedding in your ops class in order to not have to dequantize the token_embed weight beforehand, I only did it to reduce code complexity with the way my code is laid out and because I was lazy to refactor ops.py at the time.
“weird_” is a popular and very common prefix for special things that need attention in codes.
Come on now, really? Again, this is the llama.cpp standardized state dict format, I have no idea why you decided to name it after me. The original mappings are here though it looks like you've reused mine from here with the "# for remapping llama.cpp -> original key names" comment removed and the name changed. Any "weird" decisions should be explained in the form of code comments, not strange variable names.
Also “Copyright Forge” and your “(c) City96” are same thing. Your “c” here means Copyright. (Forge is AGPL because webui is AGPL.
That much is clear, but changes such as these are clearly meant to stop us from using your improved code since AGPL is not compatible with our Apache2 license and we cannot include these. You're free to adjust the license when modifying large parts of the code of course, but keeping it under the same license while adding your own name/project to the list would have been kinder.
# (c) Forge/City96/llama.cpp || Apache-2.0 (apache.org/licenses/LICENSE-2.0)
For clarification above the screenshot, one can just check this link to find out the source of Forge's gguf code structure:
https://github.com/lllyasviel/stable-diffusion-webui-forge/blob/37d8f87ae9e45af25c9fa21a34e6e5ed35c72312/packages_3rdparty/gguf/quants.py#L793
which is not the structure of city96's codes:
https://github.com/city96/ComfyUI-GGUF/blob/a9f6359a2311d9252f5502a037548689bfa70481/dequant.py#L164
And users who use ComfyUI-GGUF in normal way are already in GPL V3 of base UI rather than Apache2.0
You guys should definitely take the time to get to know one another and maybe start working on stuff together and make some really cool shit.
For clarification above the screenshot, one can just check this link to find out the source of Forge's gguf code structure:
The code part you highlight is 1:1 the llama.cpp code from here so in that case the appropriate thing would be to keep the MIT license for the entire file or folder - preferrably without micro-managing additions.
And users who use ComfyUI-GGUF in normal way are already in GPL V3 of base UI rather than Apache2.0
AGPL is more restrictive that GPL V3 as well, locking people into a license they might not want and preventing code from being used upstream. I believe this was part of your original dispute with comfyanonymous?
I can't read the geniuses' conversations, but here's a couple cold beers for the damn hot weather, cheers
Anyway, closing this. As a parting reminder, the license for the code you have here can be found here in the original llama.cpp repository - one directory up from the folder you include in it's entirety.
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
Also the token embed is no longer dequantized on load as of https://github.com/city96/ComfyUI-GGUF/commit/6618801f1c1ee6c4203e4a76b87d17f3a2a3d3eb