nativelink
nativelink copied to clipboard
Quantity fields should be human-readable
Instead of this:
"eviction_policy": {
// 50mb.
"max_bytes": 50000000,
}
We could have this which seems more readable and is standardized in the Kubernetes Quantity format:
"eviction_policy": {
"max_bytes": "50Mi",
}
Hi @aaronmondal
I was thinking of taking on this issue if still available. Before proceeding, I was wondering: should it support both versions or only the human readable version?
Hi @matdexir feel free to take the issue :heart:
Personally, I'd be in favor of just removing the int support altogether to keep the implementation simple, (otherwise the field would have to support both strings and integers).
I believe this can be done by adding something like a Quantity config object so that we can turn something like this:
#[derive(Serialize, Deserialize, Debug, Clone)]
#[serde(deny_unknown_fields)]
pub struct SizePartitioningStore {
/// Size to partition the data on.
#[serde(deserialize_with = "convert_numeric_with_shellexpand")]
pub size: u64,
// ...
}
into this:
#[derive(Serialize, Deserialize, Debug, Clone)]
#[serde(deny_unknown_fields)]
pub struct SizePartitioningStore {
/// Size to partition the data on.
#[serde(deserialize_with = "convert_numeric_with_shellexpand")]
pub size: Quantity,
// ...
}
cc @MarcusSorealheis @allada and @adam-singer @blakehatch wdyt? I believe the part that needs to be thought out here is how we can best propagate this information to monitoring services. Under the hood a Quantity would remain a uint64, but what do we want to do front-end (i.e. config-file) wise?
To be honest, I'd prefer to always use primitives (non-complex structs) in our config. Some day in the future there's a reasonable chance we'll need our structs to be FFI compliant, and using raw primitives is going to make things easier.
As for how to do this, I'd keep them as u64 but make a custom deserializer in serde that can convert strings that have suffixes to an integer. For example, if we say:
{
"size1": "5kB"
"size2": "5KiB",
"size3": "5GiB",
"duration1": "5s",
"duration2": "5m",
"duration3": "5h"
}
Would result in a struct like (respectively):
Data {
size1: 1000,
size2: 1024,
size3: 1073741824,
duration1: 5,
duration2: 300,
duration3: 18000
}
(obviously we have some places that take milliseconds, but lets ignore those for now, we can figure out what to do with them later).
It shouldn't be too difficult to make one that takes shell_expand and a suffix qualifier. I'd even argue we could probably convert what is already there and we wouldn't even need to create a new deserialization function for it.
Hi, Sorry for the late response! Didn't have enough time to dive into the issue.
Here's a sample of how I think it would go: preview gist Note: I haven't written rust in months so it may not be very idiomatic.
For this use case I would only implement deserialize since I am unsure if serialize would be needed.
My train of thought is:
enum Unit {
KiloBytes, // KB
Kibibit, // KiB
....
}
struct Quantity {
value: u64,
unit: Unit,
}
Then implement deserialization accordingly, although, I do not think that keeping the unit would be of much value :thinking:. What do you think? cc: @aaronmondal, @allada
This is pretty close to what I was thinking. There are a few minor differences, the struct's must use only FFI compatible primitives (String, Enum, u64, usize, float64, exc...).
I suggest using the following libraries that will do the conversion though. I'm pretty sure just wrapping these in a function should do much of the work: https://docs.rs/humantime/latest/humantime/fn.parse_duration.html https://docs.rs/byte-unit/latest/byte_unit/enum.Unit.html#method.parse_str
Thanks :-)
I would like to contribute on this issue. @allada @MarcusSorealheis @aaronmondal
@matdexir, @aleksdmladenovic and us had a small conversation on slack. I assume you wanted to take this one on?
Yes, now I am working on the issue for reducing the verbosity of ping message. https://github.com/TraceMachina/nativelink/issues/764 I would like to work on this issue after accomplishing.
Hi @allada Yes I wanted to take this one on. I live in a GMT+8 timezone so there may be delay in my response.
Thanks to both. @matdexir has already put work into it, so lets give some time for them to accomplish it.