joshpopelka20 comments

Results 44 comments of


                                            joshpopelka20

How best to upgrade Lambda functions from Node 12 to Node 16?

how do you manually change these lambda functions: "amplify-login-(verify/create/custom/define)-(ID)"? I don't see any Cloudformation templates for them.

VPC Access for Amplify Static Applications

any updates? Also, looking to have this feature

Quantized models on multi-GPU

I have a similar use case, where I need to shard a large model (gradient.ai llama3 262K context) across multiple GPUs. Looks like Pytorch has "fully sharded data parallel" [https://pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/...

Adding threshold to Transformers pipeline

Not sure if I'm doing this right, but this is the code I have so far: ``` inputs = tokenizer(text, return_tensors="pt") outputs = model(**inputs) predictions = outputs.logits print(PostProcessPicker.get_threshold_max(predictions, 1.8982457699258832e-06)) ```...

Adding threshold to Transformers pipeline

I'm not understanding this piece of code: ``` # Case 2: Get the predictions - where we also pass a labels list(that can be used to ignore predictions at certain...

Adding threshold to Transformers pipeline

The decode method seems to require the labels list. I've tried to create labels list with the same shape as the predictions tensor, but I'm getting a different error. Code:...

Running model from a GGUF file, only

Did you add the HuggingFace Token? I got the same error `RequestError(Status(401, Response[status: 401, status_text: Unauthorized, url: https://huggingface.co/api/models/revision/main]))` until I added the token. Here are the ways you can add...

[Feature] Implementation of multi-gpu KV cache (RingAttention)

I've been researching the algorithm further, and I'm thinking I'm going to have a problem implementing this with Rust. To start, based on my understanding, I'd need to split the...

[Feature] Implementation of multi-gpu KV cache (RingAttention)

I've been trying to implement this algorithm from the paper [https://arxiv.org/pdf/2310.01889](https://arxiv.org/pdf/2310.01889), and it really isn't working. ![image](https://github.com/user-attachments/assets/09ef9903-511b-4cc0-b378-9507d50a572f) The KV cache isn't being split so that's a big problem, but I'm...

[Feature] Implementation of multi-gpu KV cache (RingAttention)

Just adding a little more info. I think the biggest problem I'm facing is that the KV cache needs to cycle between GPUs. I'm trying to do this with a...