Parth Thakkar comments

Results 49 comments of


                                            Parth Thakkar

Stopword trimming logic seems incorrect

@pai4451 while that'll work for the python backend, it won't work for fastertransformer backend without modifying its code significantly. The proposed approach above should work for both I think. Although...

Feature: batched inference

Now that I've been working with this a little more, I see why this is an issue. Ragged batches are going to be pretty common and we need to do...

Shortcuts for numbered virtual desktops

Pinging on this one. Would love to have more features related to virtual desktops - shortcuts for switching, being able to re-arrange them, moving windows from one to another etc....

'NoneType' object has no attribute 'squeeze'

Hi @xunfeng1980, the python model doesn't support logprobs yet. If you set logprobs option to null, you won't face this issue. I'll keep this open till logprobs is in.

[python-backend] Allow more models

I think the best way to approach this would be to implement different classes for different model families. I can think of 3 families: 1. CausalLMs: models that can be...

Triton cannot find config.pbtxt

Hey @ankit-db I think the config.pbtxt file just comes with the fauxpilot repository for 1 and 2gpu variants. For other variants, I think the `./converter/trition_config_gen.py` script should be invoked. I...

Triton cannot find config.pbtxt

Hey @ankit-db I was able to generate the config.pbtxt by doing the following: 1. Modify the triton_config_gen.py file on line 59, change `params['name'] = model_name` to `params['name'] = "codegen-350M-multi"` (or...

Investigate tokenizer behaviour to understand why it is different to Huggingface Tokenizer

Hey, yeah I was planning to use this for benchmarking 4bit performance of codegen models. Most of the prompts I have are over 1500 tokens or more, and these overflow...

Investigate tokenizer behaviour to understand why it is different to Huggingface Tokenizer

Thanks! I just created a PR here to allow pretokenized inputs: https://github.com/ravenscroftj/ggml/pull/2 It seems to work fine for me.

Investigate tokenizer behaviour to understand why it is different to Huggingface Tokenizer

Thanks! I have performed a preliminary evaluation of the 6B-4bit model on Python. I ran the model on ~2000 code completion scenarios in Python (I have a custom dataset) and...