clearml-server icon indicating copy to clipboard operation
clearml-server copied to clipboard

Error when using custom domain subpath

Open Betagmr opened this issue 1 year ago • 8 comments

I'm hosting cleaml using the following path configuration.

api {
        web_server: http://<domain>/clearml/
	api_server: http://<domain>/api/
	file_server: http://<domain>/fileserver/
	
	credentials {
		"access_key"="JMJGM7JIU44XJ4D8SHVMZ86DEY3ZE9"
		"secret_key"="fwkuNBt4krfz-4Qugd4Afd3PNnuk4v0Z0qR8NxoXteC-ramUJGnSzo6KffHf4ZtAmlQ"
	}
}

After adding this configuration and running clearml-init, I noticed that the file_server path is automatically modified to use port 8081, even though a different path was specified. Here is the output I receive after running clearml-init

Detected credentials key="JMJGM7JIU44XJ4D8SHVMZ86DEY3ZE9" secret="fwku***"
Web app hosted on standard port using http protocol.
Assuming files and api ports are unchanged and use the same (http) protocol

ClearML Hosts configuration:
Web App: http://<domain>/clearml/
API: http://<domain>/api/
File Store: http://<domain>:8081/clearml/

Additionally, if I manually modify the configuration back to http://domain/fileserver/ after clearml-init, certain ClearML CLI commands stop working. Specifically, when I run the clearml-data command to download datasets, I get a connection error:

clearml-data get --id 946c88456e39474992fcefaf36608ae1
clearml-data - Dataset Management & Versioning CLI
Download dataset id 946c88456e39474992fcefaf36608ae1
Retrying (Retry(total=2, connect=2, read=5, redirect=5, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7b9c4d533c50>: Failed to establish a new connection: [Errno 111] Connection refused')': /MyExample/.datasets/MyExampleDataset/LaminadoraDataset.946c88456e39474992fcefaf36608ae1/artifacts/data/dataset.946c88456e39474992fcefaf36608ae1.hywr6569.zip

Additional Details: It appears that ClearML is truncating the file_server URL, which leads to issues in download requests. Instead of making the full request to http://<domain>/fileserver/MyExample/..., ClearML attempts to access /MyExample/... directly, ignoring the configured base path for file_server. This URL truncation causes the requests to fail, as they do not correctly point to the file server's location.

Betagmr avatar Nov 05 '24 08:11 Betagmr

Thanks for reporting @Betagmr - We'll take a look

ainoam avatar Nov 05 '24 15:11 ainoam

Hi @ainoam! Any updates on the problem?

Betagmr avatar Nov 11 '24 08:11 Betagmr

Hi @Betagmr ! Can you share your clearml sdk version, as well as a concrete example configuration string that you are passing to clearml-init that creates the same problem?

I fix both problems!

The first error related to clearml-init occurred because I accidentally used file_server instead of files_server. When I initially created the credentials, this parameter wasn’t included, so I added it manually. Due to the incorrect parameter name, a default one was applied.

The second issue, with the URL truncation, happened when we stored a dataset on an internal domain. Upon migrating to a public domain, the dataset URL artifact still pointed to the internal domain, causing a mismatch that triggered the error. We resolved it by deleting the dataset and re-uploading it.

Now, my question is: is there a way to edit the output shown by “create new credentials” in the web app? Also, can we edit the metadata of storage artifacts?

Betagmr avatar Nov 13 '24 08:11 Betagmr

@Betagmr see https://github.com/allegroai/clearml-web/issues/67 for controlling credential settings. For storage migration - Which metadata are you referring to?

ainoam avatar Nov 13 '24 15:11 ainoam

Ty @ainoam

On uploading a dataset to files_server it saves an artifact data at http://<domain-1>:8081/URL/TO/DATASET

I recently changed my domain to http://<domain-2>:8081/ but the artifact stored didn't update that file path to http://<domain-2>:8081/URL/TO/DATASET and I get errors related to that. Because Dataset.get() uses that url for downloading the data and not my FILES_SERVER_PATH/PATH/TO/DATASET.

Captura desde 2024-11-14 09-10-45

Betagmr avatar Nov 14 '24 08:11 Betagmr

Ty @ainoam

On uploading a dataset to files_server it saves an artifact data at http://<domain-1>:8081/URL/TO/DATASET

I recently changed my domain to http://<domain-2>:8081/ but the artifact stored didn't update that file path to http://<domain-2>:8081/URL/TO/DATASET and I get errors related to that. Because Dataset.get() uses that url for downloading the data and not my FILES_SERVER_PATH/PATH/TO/DATASET.

Captura desde 2024-11-14 09-10-45

I ran into this exact same issue. The file server http links broke when the machine hostname changed from data-01 to data-00. This is all it took to break the file server images as there is no automatic check and renaming of the machine hostname from say http://data-01:8081/picture123.jpg to http://data-00:8081/picture123.jpg

I know this can be solved by recreating the entire dataset, but that's terribly inefficient especially when dealing with tens of gigabytes. Much simpler to update the URL. @ainoam and ClearML team, is there any way to do this without resorting to a lot of hacking or dataset recreation?

Thanks!

finickyDrone avatar Feb 07 '25 18:02 finickyDrone

@Betagmr @finickyDrone These break since task artifacts use an explicit reference to the target file. In this context, changing the server domain, is equivalent to changing the storage service where you store your files. To remedy, please see the ClearML FAQ (Applies to server v1.17 and above).

ainoam avatar Feb 11 '25 12:02 ainoam