nativelink icon indicating copy to clipboard operation
nativelink copied to clipboard

Quantity fields should be human-readable

Open aaronmondal opened this issue 1 year ago • 10 comments

Instead of this:

"eviction_policy": {
  // 50mb.
  "max_bytes": 50000000,
}

We could have this which seems more readable and is standardized in the Kubernetes Quantity format:

"eviction_policy": {
  "max_bytes": "50Mi",
}

aaronmondal avatar Apr 18 '24 00:04 aaronmondal

Hi @aaronmondal

I was thinking of taking on this issue if still available. Before proceeding, I was wondering: should it support both versions or only the human readable version?

matdexir avatar Apr 22 '24 12:04 matdexir

Hi @matdexir feel free to take the issue :heart:

Personally, I'd be in favor of just removing the int support altogether to keep the implementation simple, (otherwise the field would have to support both strings and integers).

I believe this can be done by adding something like a Quantity config object so that we can turn something like this:

#[derive(Serialize, Deserialize, Debug, Clone)]
#[serde(deny_unknown_fields)]
pub struct SizePartitioningStore {
    /// Size to partition the data on.
    #[serde(deserialize_with = "convert_numeric_with_shellexpand")]
    pub size: u64,

    // ...
}

into this:

#[derive(Serialize, Deserialize, Debug, Clone)]
#[serde(deny_unknown_fields)]
pub struct SizePartitioningStore {
    /// Size to partition the data on.
    #[serde(deserialize_with = "convert_numeric_with_shellexpand")]
    pub size: Quantity,

    // ...
}

cc @MarcusSorealheis @allada and @adam-singer @blakehatch wdyt? I believe the part that needs to be thought out here is how we can best propagate this information to monitoring services. Under the hood a Quantity would remain a uint64, but what do we want to do front-end (i.e. config-file) wise?

aaronmondal avatar Apr 22 '24 20:04 aaronmondal

To be honest, I'd prefer to always use primitives (non-complex structs) in our config. Some day in the future there's a reasonable chance we'll need our structs to be FFI compliant, and using raw primitives is going to make things easier.

As for how to do this, I'd keep them as u64 but make a custom deserializer in serde that can convert strings that have suffixes to an integer. For example, if we say:

{
  "size1": "5kB"
  "size2": "5KiB",
  "size3": "5GiB",
  "duration1": "5s",
  "duration2": "5m",
  "duration3": "5h"
}

Would result in a struct like (respectively):

Data {
  size1: 1000,
  size2: 1024,
  size3: 1073741824,
  duration1: 5,
  duration2: 300,
  duration3: 18000
}

(obviously we have some places that take milliseconds, but lets ignore those for now, we can figure out what to do with them later).

It shouldn't be too difficult to make one that takes shell_expand and a suffix qualifier. I'd even argue we could probably convert what is already there and we wouldn't even need to create a new deserialization function for it.

allada avatar Apr 23 '24 01:04 allada

Hi, Sorry for the late response! Didn't have enough time to dive into the issue.

Here's a sample of how I think it would go: preview gist Note: I haven't written rust in months so it may not be very idiomatic.

For this use case I would only implement deserialize since I am unsure if serialize would be needed. My train of thought is:

enum Unit {
 KiloBytes, // KB
 Kibibit, // KiB
 ....
}
struct Quantity {
  value: u64,
  unit: Unit,
}

Then implement deserialization accordingly, although, I do not think that keeping the unit would be of much value :thinking:. What do you think? cc: @aaronmondal, @allada

matdexir avatar Apr 25 '24 15:04 matdexir

This is pretty close to what I was thinking. There are a few minor differences, the struct's must use only FFI compatible primitives (String, Enum, u64, usize, float64, exc...).

I suggest using the following libraries that will do the conversion though. I'm pretty sure just wrapping these in a function should do much of the work: https://docs.rs/humantime/latest/humantime/fn.parse_duration.html https://docs.rs/byte-unit/latest/byte_unit/enum.Unit.html#method.parse_str

Thanks :-)

allada avatar Apr 26 '24 15:04 allada

I would like to contribute on this issue. @allada @MarcusSorealheis @aaronmondal

boldpulse avatar Apr 26 '24 16:04 boldpulse

@matdexir, @aleksdmladenovic and us had a small conversation on slack. I assume you wanted to take this one on?

allada avatar Apr 27 '24 00:04 allada

Yes, now I am working on the issue for reducing the verbosity of ping message. https://github.com/TraceMachina/nativelink/issues/764 I would like to work on this issue after accomplishing.

boldpulse avatar Apr 27 '24 01:04 boldpulse

Hi @allada Yes I wanted to take this one on. I live in a GMT+8 timezone so there may be delay in my response.

matdexir avatar Apr 27 '24 01:04 matdexir

Thanks to both. @matdexir has already put work into it, so lets give some time for them to accomplish it.

allada avatar Apr 27 '24 02:04 allada