pint-pandas
pint-pandas copied to clipboard
Parsing CSV with units in the header
This is not really a bug report, but more a suggestion for who may have the same issue as me (however, I would not be angry if a similar solution would be integrated in pint-pandas).
Let suppose you have a CSV like this:
molecule,reduced_field [1e-21 * V * m^2],magboltz_drift_velocity [m / s],magboltz_drift_velocity_precision,bolsig_reduced_mobility [1 / V / m / s],bolsig_drift_velocity [m / s],bolsig_delta,bolsig_validation,betaboltz_drift_velocity [m / s],betaboltz_drift_velocity_stdev [m / s],betaboltz_delta,betaboltz_validation
Ar,0.1,1702.9999999999998,0.45999999999999996,1.647e+25,1647.0,-3.2883147386964144,OK,1806.69,623.853,6.0886670581327165,OK
Ar,0.12589254117941673,1811.9999999999998,0.42,1.394e+25,1755.046,-3.143156732891819,OK,1891.57,504.651,4.3912803532008935,OK
Ar,0.15848931924611134,1932.9999999999998,0.62,1.179e+25,1868.715,-3.32565959648215,OK,2209.5899999999997,619.583,14.30884635281946,NOK
Ar,0.19952623149688797,2047.9999999999998,0.43,9.957e+24,1986.4215000000002,-3.006762695312487,OK,2263.21,573.505,10.50830078125002,NO
And you want to read these data using Pandas, you can use this code:
import pandas as pd
import pint
import pint_pandas
import re
def fix_pint_pandas_units(df):
p = re.compile(r'^(.*)\s*\[(.*)\]\s*$')
for column in df.columns:
m = p.match(column)
if m:
name = m.group(1).strip()
unit = m.group(2).strip()
df.rename(columns={column: name}, inplace=True)
df[name] = pd.Series(df[name], dtype='pint[' + unit + ']')
if __name__ == '__main__':
df = pd.read_csv('results.csv')
fix_pint_pandas_units(df)
print(df.dtypes)
May be this could be integrated directly in pint-pandas, eventually activated by a flag.