clusterdata icon indicating copy to clipboard operation
clusterdata copied to clipboard

cluster data collected from production clusters in Alibaba for cluster management research

Results 125 clusterdata issues
Sort by recently updated
recently updated
newest added

As what is given in trace_201708.md, we found that both task and instance all have status of "Waiting". and what is declared is: task -> Waiting: A task in not...

Hello, In MR, Spark, we are assuming each mapper or reducer handles portion of data. The data size for each map or reduce instance is at most equal to hdfs...

Glad to see the new trace includes memory bandwidth usage information. I've checked several machine_usage entries and found non-empty values. I'm somehow confused with its description "Normalized to maximum memory...

Currently, the time unit used in the traces is in seconds. Could you please provide the traces in finer time unit, e.g., milliseconds? It will help a lot when using...

After anlalyzing the machine_usage.csv, I found that about 50% of machine memory are used neither by instance nor container, for example: machine_id = 'm_2824' , time_stamp = 461830, all instance...

Does "Machines" and "Server" mean physical machine here?

I am wondering if machine attributes such as resource capacity and information (num of cores, num of disks, num of CPUs, kernel version, clock speed, eth_speed, architecuture etc. ) can...

I don't quite understand " A job contains multiple tasks". Can anyone give an example about what is a job and what is a task? Thanks in advance.

There are many tasks in the dataset that utilize more resources than what was requested. For instance, job_id:10771 task_id:66551 has plan_cpu:0.75 [1] from the following entry in _batch_task.csv_: > 6301,6352,10771,66551,137,Terminated,**75**,0.01600704061294748...

Would it be possible to add a field broadly classifying the root cause of the different failures? Both task failures and instance failures.