environmental-footprint-data icon indicating copy to clipboard operation
environmental-footprint-data copied to clipboard

Check memory unit

Open ggael opened this issue 2 years ago • 3 comments

The memory attribute is expected to be a float number in GB. This means that 1) the GB must be removed, and 2) that the parsers have to guaranty that they parsed a number in the right unit, that is not the case yet.

ggael avatar Sep 20 '22 12:09 ggael

I was looking at this issue and it could be maybe closed, because for:

  1. all the memory values containing "GB" were fixed/removed on this commit
  2. test_read_format() is currently validating type conversion when reading values

Otherwise, maybe if the idea is to enforce strict floating point number format (e.g 4.0 instead of 4), in that case probably checking int conversion before float could allow to check that:

>>> int('4.0')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '4.0'
>>> float('4.0')
4.0

and that change will impact all the float type fields of DeviceCarbonFootprintData.

Do you have any advice on this? if this issue is still valid I could add a test for that, fix all parsers that generate float values and updated CSV files.

pabluk avatar Oct 25 '22 23:10 pabluk

This issue was more about checking all parsers to make sure that they do verify that the number they read is in GB and make the conversion if it's not the case (AFAIR there are a few cases for which the reported unit is not in GB)

ggael avatar Oct 26 '22 08:10 ggael

Oh I see, thanks for the explanation.

Looking at the current data on boavizta-data-us.csv indeed, there are some suspicious memory values:

>>> import csv
>>> 
>>> with open('boavizta-data-us.csv', 'rt', encoding='utf-8') as data_file:
...     reader = csv.DictReader(data_file)
...     memory_values = [row['memory'] for row in reader if row['memory']]
... 
>>> unique_values = list(set(memory_values))
>>> unique_values.sort(key=float)
>>> unique_values
['3.0', '4.0', '6.0', '8.0', '8', '12.0', '16.0', '16', '32.0', '32', '48.0', '64.0', '64', '128', '256']

for example, 256GB of RAM on a server could be normal but not for a smartphone:

>>> import csv
>>> 
>>> with open('boavizta-data-us.csv', 'rt', encoding='utf-8') as data_file:
...     reader = csv.DictReader(data_file)
...     for row in reader:
...         if row['memory'] and float(row['memory']) >= 128:
...             print(row['manufacturer'], row['name'], row['subcategory'], row['memory'])
... 
Apple iPhone 11 128GB Smartphone 128
Apple iPhone 11 256GB Smartphone 256
Apple iPhone 12 128GB Smartphone 128
Apple iPhone 12 256GB Smartphone 256
Apple iPhone 8 256GB Smartphone 256
Apple iPhone SE - Gen 2 128GB Smartphone 128
Apple iPhone SE - Gen 2 256GB Smartphone 256
Dell PowerEdge R830 Server 256
Dell PowerEdge R840 Server 128
Dell PowerEdge R930 Server 256

it seems that Apple's parser is incorrectly filling the memory field with storage capacity from iPhones. Even if it isn't a unit/scale issue, the parser need to be reviewed.

pabluk avatar Oct 26 '22 22:10 pabluk