Riley Hun issues

Results 13 issues of


                                            Riley Hun

Performance Issues - Instrumentation is Resulting in Increased Latency

We are trying to decrease the latency of our BERT model prediction service that is deployed using FastAPI. The predictions are called through the `/predict` endpoint. We looked into the...

Dask Gateway Start Up Error on EMR Cluster

**What happened**: I am trying to set up Dask Gateway on an EMR cluster following the guidelines specified [here](https://gateway.dask.org/install-hadoop.html). The only difference is that I am not sure how to...

How to Use TLS in Dask-Gateway?

I understand that it is recommended to use TLS in a production environment as per the docs, so I'm trying to set that up. Here are the steps I followed...

How to Change Dashboard Address for Dask Gateway?

Hello, We are using an internal TCP load balancer to expose the traefik proxy for security purposes. Our users are able to create a client connection to the cluster generated...

GridSearch Error: KeyError: 'data'

I am getting the following error when running a gridsearch on dask distributed back-end. This error is nonexistent when just running sklearn gridsearch on single core local machine. I don't...

[Helm] Kibana Service Won't Start Up When Using CertManager

**Describe the bug** I really need OpenDistro running in a production environment for a project, but I'm having a lot of trouble getting my Kibana server to start up when...

bug

Support for JWT?

Is there a way to authenticate to HANA using JWT? We are currently using basic authentication, but want to transition into integrating OAUTH because we don't want to enter username/password...

My fluentd pods keep restarting

I am noticing my fluentd pods keep restarting. They are collecting the logs and sending them to elasticsearch, so the workflow isn't broken per se, but in the last 13...

bug

OSError: [Errno 28] No space left on device

Hello all, Unfortunately, I ran out of space training the 3B model. I'm using a `p3.16xlarge` instance and it ran out of space on epoch 0.3. Any advice on how...

CUDA out of memory. Can this be run on a p3.16xlarge?

I am training using the `EleutherAI/pythia-2.8b` model and I'm using a `p3.16xlarge`. I tried the instructions for training on smaller instances, but still got a `CUDA out of memory` error...