clearml icon indicating copy to clipboard operation
clearml copied to clipboard

Segmentation Fault after 2 line integration

Open PaulZhangIsing opened this issue 3 years ago • 4 comments
trafficstars

Thank you for helping us making ClearML better!

Describe the bug

After adding clearml and clearml.task, I run the code as per-normal, and it happens to be like image

I am using local clearml serve It does not happen this problem when I first install and run it It only happens after I have re-installed it, as first time I tried to add huge dataset to my local server and there is no disk space. I have to remove everything thre

To reproduce

My clearml server is run locally. after setting up server, do 2 line integration

Expected behaviour

What is the expected behaviour? What should've happened but didn't?

Environment

  • Server type (self hosted \ app.clear.ml) self hosted
  • ClearML SDK Version 1.6.4
  • ClearML Server Version (Only for self hosted). Can be found on the bottom right corner of the settings screen. WebApp: 1.6.0-213 • Server: 1.6.0-213 • API: 2.20
  • Python Version 3.8
  • OS (Windows \ Linux \ Macos) Linux

Related Discussion

If this continues a slack thread, please provide a link to the original slack thread.

PaulZhangIsing avatar Aug 25 '22 03:08 PaulZhangIsing

Hi @PaulZhangIsing ,

I assume this issue involves some of the other libraries/code you're using (perhaps an issue with patching the libraries) - can you add an example of the actual code you're running?

jkhenning avatar Aug 25 '22 06:08 jkhenning

Hi @PaulZhangIsing ,

I assume this issue involves some of the other libraries/code you're using (perhaps an issue with patching the libraries) - can you add an example of the actual code you're running?

Sure. The attached is the code:

`import json import numpy as np import os import tensorflow as tf from tensorflow import keras as keras import tensorflow.keras.initializers as tf_initializers import shutil

from clearml import Task

task = Task.init(project_name="PAS_ARADO", task_name="trial1")

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

def load_data(): fashion_mnist = tf.keras.datasets.fashion_mnist (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data() train_images = train_images / 255.0 test_images = test_images / 255.0 return train_images, train_labels, test_images, test_labels

def make_model(args): model = tf.keras.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(10) ])

model.compile(tf.keras.optimizers.Adam(learning_rate=args['learning_rate']),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

return model

def train(args, model, train_x, train_y):

model.fit(train_x, train_y, epochs=args['epochs'], batch_size=args['batch_size'])
return model

def evaluate(args, model, test_x, test_y): score = model.evaluate(x=test_x, y=test_y, batch_size=args['test_batch_size']) return score

def run_and_log(json_filename='json_data.json'): with open(json_filename) as json_file: args = json.load(json_file)

print(args)

tf.compat.v1.random.set_random_seed(args['seed'])



train_x, train_y, test_x, test_y = load_data()

model = make_model(args)


trained_model = train(args, model, train_x, train_y)
tf.saved_model.save(trained_model, "./tmp/saved_model.h5")
results = evaluate(args, trained_model, test_x, test_y)

print("\nTest set accuracy: %   .3f\n" % results[1])

converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
TFLITE_FILE_PATH = './tmp/tflite_model.tflite'
with open(TFLITE_FILE_PATH, 'wb') as f:
    f.write(tflite_model)

interpreter = tf.lite.Interpreter(TFLITE_FILE_PATH)
interpreter.allocate_tensors()

print("== Input details ==")
print("name:", interpreter.get_input_details()[0]['name'])
print("shape:", interpreter.get_input_details()[0]['shape'])
print("type:", interpreter.get_input_details()[0]['dtype'])

print("\n== Output details ==")
print("name:", interpreter.get_output_details()[0]['name'])
print("shape:", interpreter.get_output_details()[0]['shape'])
print("type:", interpreter.get_output_details()[0]['dtype'])

print("\nDUMP INPUT")
print(interpreter.get_input_details()[0])
print("\nDUMP OUTPUT")
print(interpreter.get_output_details()[0])

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

score = 0

input_shape = input_details[0]['shape']
for index, input_x in enumerate(test_x):
    input_data = np.array([input_x], dtype=np.float32)
    interpreter.set_tensor(input_details[0]['index'], input_data)
    interpreter.invoke()
    output_data = interpreter.get_tensor(output_details[0]['index'])[0]
    if np.argmax(output_data) == test_y[index]:
        score += 1
    else:
        pass
    score = score / len(test_y)
    print("\nTFlite set accuracy: %   .3f\n" % score)

`

PaulZhangIsing avatar Aug 25 '22 06:08 PaulZhangIsing

Hi @PaulZhangIsing,

I tried running your code but it reads a json_data.json and later uses it. I removed the dependency on json_data file and I could train the model without any problem.

Does the issue also happen when you remove ClearML from the code?

erezalg avatar Aug 25 '22 07:08 erezalg

Hi @PaulZhangIsing,

I tried running your code but it reads a json_data.json and later uses it. I removed the dependency on json_data file and I could train the model without any problem.

Does the issue also happen when you remove ClearML from the code?

The code was implemented originally in MLFlow. This problem does not happen before. Acturally the logic is to run a shell script which call 3 python files. first one generates the json_files, and second one configure it well and call this script to run. Thanks @erezalg for your suggestion. I try to remove the json_files.json and read it later

PaulZhangIsing avatar Aug 26 '22 02:08 PaulZhangIsing

Hi @PaulZhangIsing,

was this issue solved?

erezalg avatar Nov 27 '22 11:11 erezalg