aistore icon indicating copy to clipboard operation
aistore copied to clipboard

I need some one to help me setup s3 bucket as backend to ais bucket

Open Mahmoud-ghareeb opened this issue 7 months ago β€’ 4 comments

i was able to setup ais from local storage bucket but its not working when i try to make it from s3 bucket

also, i have ais in 1 server, please explain to me how to access this ais from another remote training server

Mahmoud-ghareeb avatar May 14 '25 23:05 Mahmoud-ghareeb

Hi @Mahmoud-ghareeb, thanks for using AIStore πŸ™‚

To help diagnose your S3-backend and remote access issues, could you share a few more details?

  • How did you deploy your AIStore cluster?
  • How did you configure remote backends (e.g. S3) for the cluster?

rkoo19 avatar May 15 '25 00:05 rkoo19

Hello @rkoo19 i did a lot of research last 5 days about aistore to be sure am fully understanding our conversation

First point

  1. I cloned it from githup - then β€˜β€™β€™make deploy’’’ on a remote server
  2. i added the data to the bucket
  3. i trained the model on the same server and everything is working great

But when i use another server for training and try to access the server that has aistore to get the data

Its not working and gives this error when fetching the data from the remote server that has aistore OpError: dial tcp 127.0.0.1:8081 connection refused

I check everything and what i got

  • ais ls -> works great and list all buckets in the remote server
  • ais cluster show -> doesn’t work and gives the following error β€˜β€™β€™ Warning: empty version from t[HMCt8081] (in maintenance mode?) Warning: empty version from p[kmlp8080] (in maintenance mode?) OpError: dial tcp 127.0.0.1:8080: connect: connection refused β€˜β€™β€™

Can you help me please with this first then we will go to S3 proplem

Mahmoud-ghareeb avatar May 20 '25 04:05 Mahmoud-ghareeb

Hey @Mahmoud-ghareeb, thanks for the extra context!

The make deploy target in the root of the AIStore repository is intended only for local development and testing. It spins up a minimal, lightweight cluster bound to 127.0.0.1 (localhost), making it accessible only from the host. So when your training server tries to connect to 127.0.0.1 (localhost), it's referring to itself β€” not the host where AIStore is running β€” which results in connection errors, as you've seen.

For any real or production-level usage β€” including remote access from training servers β€” we recommend using the full deployment via ais-k8s. This Kubernetes-based deployment provides a properly networked, externally accessible AIStore cluster with support for multi-node configurations, remote backends (e.g. S3), and much more. You can follow these instructions to enable external access as part of the deployment process.

rkoo19 avatar May 20 '25 15:05 rkoo19

Hey @rkoo19 thank you for this information!!

I will try to deploy it via ais-k8s and come back to you if i have any questions

On Tue, 20 May 2025 at 6:59β€―PM Ryan Bon-Hyuk Koo @.***> wrote:

rkoo19 left a comment (NVIDIA/aistore#218) https://github.com/NVIDIA/aistore/issues/218#issuecomment-2895026956

Hey @Mahmoud-ghareeb https://github.com/Mahmoud-ghareeb, thanks for the extra context!

The make deploy target in the root of the AIStore repository is intended only for local development and testing. It spins up a lightweight, single-node cluster bound to 127.0.0.1 (localhost), which means it’s only accessible from within the same machine β€” so attempting to connect to it remotely will result in connection errors, as you've seen.

For any real or production-level usage β€” including remote access from training servers β€” we recommend using the full deployment via ais-k8s https://github.com/NVIDIA/ais-k8s. This Kubernetes-based deployment provides a properly networked, externally accessible AIStore cluster with support for multi-node configurations, remote backends (e.g. S3), and much more. You can follow these instructions to enable external access https://github.com/NVIDIA/ais-k8s/blob/main/operator/README.md#enabling-external-access as part of the deployment process.

β€” Reply to this email directly, view it on GitHub https://github.com/NVIDIA/aistore/issues/218#issuecomment-2895026956, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF4ABTW6U6VBO6W4G5WXJ4T27NGM3AVCNFSM6AAAAAB5EVF6LWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQOJVGAZDMOJVGY . You are receiving this because you were mentioned.Message ID: @.***>

Mahmoud-ghareeb avatar May 20 '25 22:05 Mahmoud-ghareeb

not an issue

alex-aizman avatar Jul 14 '25 14:07 alex-aizman