pvlib.iotools.read_panond for PAN files with Borland Pascal format?
Is your feature request related to a problem? Please describe.
- PAN files come in either a utf-8 or Borland Pascal format
- Currently read_panond works for utf-8 format but not Borland Pascal
- Is there any interest in a Real48 extension to the read_panond function that is in iotools?
Describe the solution you'd like
- A function which would read the Real48 format PAN file into a python dictionary
Describe alternatives you've considered
- We could not support the Borland Pascal format
Additional context
- I don't think that there are any papers that we could reference, but there is this CASSYS implementation: https://github.com/CanadianSolar/CASSYS/blob/b5487bb4e9e77174c805d64e3c960c46d357b7e2/CASSYS%20Interface/DatabaseImportModule.vba#L4
It it’s Borland pascal. Also doesn’t Frederic Rivollier repo (https://github.com/frivollier/pvsyst_tools) also convert these, tho might be similar or same as CASSYS since he was also Canadian solar / recurrent.
I also took a look at that library and couldn't find any mention of the Borland Pascal format, editing the title/description! Thanks Mark
Are the Pascal files text or binary?
They are binary
Is the binary format still relevant? I had the impression that it was phased out in favor of the text format a while ago, but maybe that's wrong.
From https://www.pvsyst.com/help-pvsyst7/format_of_pvsyst_files.htm:
Up to PVsyst version 6.39, all PVsyst files (project, variant, meteo, etc.) were saved in binary format and it was not possible to read or edit those files. Starting with PVsyst version 6.40, the format of all PVsyst files has changed to text. This new format will simplify PVsyst upgrades. Starting with PVsyst version 6.60, the format of PVsyst project version (VCi) has been improved. Starting with PVsyst version 6.80, the encoding of all PVsyst files (*.PRJ, *.VCi, *.PAN, *.OND, etc.) has changed to UTF-8 to support international characters. For each of these evolutions, the new formats are incompatible with older versions. This is the reason why the corresponding workspaces have different names (PVsyst6_Data, PVsyst640_Data, PVsyst660_Data, PVsyst680_Data, PVsyst7.0_Data).
For context, PVsyst 6.40 was released Jan 2016.
They are binary
Ugh. I'm guessing you would have to know the data structure and types in advance, to be able to read the file with python.
@kandersolar They are definitely less relevant than the newer text format. My use case is for modeling operating plants that were installed a long time ago.
I'm attempting to write a translation for Proximal now and would be happy to contribute it to pvlib if you all think it would be interesting/useful to have for the community.
I'm attempting to write a translation for Proximal now and would be happy to contribute it to pvlib if you all think it would be interesting/useful to have for the community.
Was about to state just that. There surely is the possibility of reverse engineering them. Does PVsyst provide a translator for them? That trims the logic behind the file format.
My concern is about maintaining it
My concern is about maintaining it
Fair enough. I guess it depends on how many versions of PAN files there are. And even if it turns out that it's overcomplicated, a public gist may be of help for other people in the same situation as Kurt. So I wouldn't stop that until I see a PR ;)
May be of interest, back in 2014 they were open to release the format! https://forum.pvsyst.com/topic/430-pan-file-format/ (4th comment)
@echedey-ls I'm happy to post a gist of it on my personal github instead if that reduces the maintenance burden on pvlib!
@kurt-rhee I'd go ahead with a PR, and if it seems too unmaintanable, then switch to the Gist solution. Whatever is the case, I find valuable that it gets some visibility in pvlib. There can be more people in your same situation.
@echedey-ls Here is how it looks in Proximal's codebase, if it is still of interest i'll remove the type annotations and open a PR
# --- Constants ---
SEMICOLON_MARKER = 0x3B
DOT_MARKER = 0x09
DOUBLE_DOT_MARKER = 0x0A
FORWARD_SLASH_MARKER = 0x2F
CR_MARKER = 0x0D # Carriage Return
VERTICAL_BAR_MARKER = 0xA6
# --- Supporting Functions ---
def _read48_to_float(*, real48: bytes) -> float:
"""
Converts a 6-byte Delphi Real48 encoded value to a standard Python float.
The format consists of:
- 1 byte: Exponent (offset by 129)
- 5 bytes: Mantissa, with the last bit of the 5th byte as the sign bit.
"""
if not real48 or len(real48) != 6 or real48[0] == 0:
return 0.0
# The exponent is the first byte, with an offset of 129
exponent = float(real48[0] - 129)
mantissa = 0.0
# Process the first 4 bytes of the mantissa
# The division by 256 (or multiplication by 0.00390625) shifts the bytes
for i in range(4, 0, -1):
mantissa += real48[i]
mantissa /= 256.0
# Process the 5th byte of the mantissa
mantissa += real48[5] & 0x7F # Use only the first 7 bits
mantissa /= 128.0 # equivalent to * 0.0078125
mantissa += 1.0
# Check the sign bit (the last bit of the 6th byte)
if (real48[5] & 0x80) == 0x80:
mantissa = -mantissa
# Final calculation using the exponent
return mantissa * (2.0**exponent)
def _find_marker_index(*, marker: int, start_index: int, byte_array: bytes) -> int:
"""
Finds the index of the first occurrence of a hex marker after a start index.
Returns the index right after the marker.
"""
# bytearray.find is more efficient than a manual loop
found_index = byte_array.find(bytes([marker]), start_index)
if found_index != -1:
return found_index + 1
if found_index is None:
raise ValueError(f"Marker {marker} not found in byte array")
return found_index
def _get_param_index(*, start_index: int, offset_num: int) -> int:
"""Calculates the start index of a Real48 parameter."""
return start_index + 6 * offset_num
def _extract_byte_parameters(
*, byte_array: bytes, start_index: int, num_bytes: int
) -> bytes:
"""
This function extracts bytes that form a single parameter from the original byte array
(contains the bytes from the whole file) into a smaller byte array that it returns.
"""
# Check bounds to avoid index errors
if start_index + num_bytes > len(byte_array):
raise IndexError(
f"Not enough bytes: need {num_bytes} bytes starting at {start_index}"
)
# Extract the specified number of bytes starting at start_index
param_byte_sequence = byte_array[start_index : start_index + num_bytes]
return param_byte_sequence
def read_pan_binary(*, file_content: bytes) -> dict:
"""
Parses a binary .PAN file and returns its contents as a dictionary.
Args:
pan_file_path: The full path to the .PAN file.
Returns:
A dictionary containing the parsed data from the PAN file.
"""
data: dict[str, Any] = {}
byte_array = file_content
if not byte_array:
raise ValueError("File is empty")
# --- Find start indices for string parameters ---
try:
manu_start_index = _find_marker_index(
marker=SEMICOLON_MARKER, start_index=0, byte_array=byte_array
)
panel_start_index = _find_marker_index(
marker=DOT_MARKER, start_index=0, byte_array=byte_array
)
source_start_index = _find_marker_index(
marker=DOT_MARKER, start_index=panel_start_index, byte_array=byte_array
)
version_start_index = _find_marker_index(
marker=DOUBLE_DOT_MARKER,
start_index=source_start_index,
byte_array=byte_array,
)
version_end_index = _find_marker_index(
marker=SEMICOLON_MARKER,
start_index=version_start_index,
byte_array=byte_array,
)
year_start_index = _find_marker_index(
marker=SEMICOLON_MARKER,
start_index=version_end_index,
byte_array=byte_array,
)
technology_start_index = _find_marker_index(
marker=DOUBLE_DOT_MARKER,
start_index=year_start_index,
byte_array=byte_array,
)
cells_in_series_start_index = _find_marker_index(
marker=SEMICOLON_MARKER,
start_index=technology_start_index,
byte_array=byte_array,
)
cells_in_parallel_start_index = _find_marker_index(
marker=SEMICOLON_MARKER,
start_index=cells_in_series_start_index,
byte_array=byte_array,
)
bypass_diodes_start_index = _find_marker_index(
marker=SEMICOLON_MARKER,
start_index=cells_in_parallel_start_index,
byte_array=byte_array,
)
# --- Find start of Real48 encoded data ---
cr_counter = 0
real48_start_index = 0
for i, byte in enumerate(byte_array):
if byte == CR_MARKER:
cr_counter += 1
if cr_counter == 3:
real48_start_index = i + 2 # Skip <CR><LF>
break
if real48_start_index == 0:
return {"error": "Could not find start of Real48 data block."}
# --- Extract string parameters ---
# Note: latin-1 is used as it can decode any byte value without error
data["Manufacturer"] = (
byte_array[manu_start_index : panel_start_index - 1]
.decode("latin-1")
.strip()
)
data["Model"] = (
byte_array[panel_start_index : source_start_index - 1]
.decode("latin-1")
.strip()
)
data["Source"] = (
byte_array[source_start_index : version_start_index - 4]
.decode("latin-1")
.strip()
)
data["Version"] = (
byte_array[version_start_index : version_end_index - 2]
.decode("latin-1")
.replace("Version", "PVsyst")
.strip()
)
data["Year"] = (
byte_array[year_start_index : year_start_index + 4]
.decode("latin-1")
.strip()
)
data["Technology"] = (
byte_array[technology_start_index : cells_in_series_start_index - 1]
.decode("latin-1")
.strip()
)
data["Cells_In_Series"] = (
byte_array[cells_in_series_start_index : cells_in_parallel_start_index - 1]
.decode("latin-1")
.strip()
)
data["Cells_In_Parallel"] = (
byte_array[cells_in_parallel_start_index : bypass_diodes_start_index - 1]
.decode("latin-1")
.strip()
)
# --- Parse Real48 encoded parameters ---
param_map = {
"PNom": 0,
"VMax": 1,
"Tolerance": 2,
"AreaM": 3,
"CellArea": 4,
"GRef": 5,
"TRef": 6,
"Isc": 8,
"muISC": 9,
"Voc": 10,
"muVocSpec": 11,
"Imp": 12,
"Vmp": 13,
"BypassDiodeVoltage": 14,
"RShunt": 17,
"RSerie": 18,
"RShunt_0": 23,
"RShunt_exp": 24,
"muPmp": 25,
}
for name, offset in param_map.items():
start = _get_param_index(start_index=real48_start_index, offset_num=offset)
end = start + 6
param_bytes = byte_array[start:end]
value = _read48_to_float(real48=param_bytes)
if name == "Tolerance":
value *= 100 # Convert to percentage
if value > 100:
value = 0.0
data[name] = value
# --- Check for and Parse IAM Profile ---
dot_counter = 0
iam_start_index = 0
dot_position = data["Version"].find(".")
major_version = int(data["Version"][dot_position - 1 : dot_position])
if major_version < 6:
for i in range(real48_start_index + 170, len(byte_array)):
if byte_array[i] == DOT_MARKER:
dot_counter += 1
if dot_counter == 2:
iam_start_index = i + 4
break
if iam_start_index > 0:
data["IAMProfile"] = _extract_iam_profile(
start_index=iam_start_index, byte_array=byte_array
)
except (IndexError, TypeError, struct.error) as e:
return {"error": f"Failed to parse binary PAN file: {e}"}
return data
@kurt-rhee do you have a sample binary file that will serve for testing?
Hey @cwhanse unfortunately we are under NDA and cannot share this file, if I do come across one in the future which I can share, I'll post it here. Happy to close this until that is the case if that helps everyone.
if I do come across one in the future which I can share, I'll post it here
That would be super. I think we will go nowhere without a sample file to test on. Maybe by making up the data just a little bit... Thanks for sharing that all @kurt-rhee .
The second post here https://forum.pvsyst.com/topic/430-pan-file-format/ makes it sound like someone with a copy of an old version of PVsyst could make up their own module and export it to a binary PAN file. That could make for a good sharable sample file.
But maybe I'm misunderstanding what PVsyst could do.
Would this file work?
I've tested and it seems no... I wonder why.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File c:\Users\Yo\Documents\18_Work\PVLIB\pvlib-python\docs\examples\pan_file_reader.py:282
280 with open("sample.PAN", "rb") as f:
281 file_content = f.read()
--> [282](file:///C:/Users/Yo/Documents/18_Work/PVLIB/pvlib-python/docs/examples/pan_file_reader.py:282) pan_data = read_pan_binary(file_content=file_content)
283 print(pan_data)
File c:\Users\Yo\Documents\18_Work\PVLIB\pvlib-python\docs\examples\pan_file_reader.py:258
256 iam_start_index = 0
257 dot_position = data["Version"].find(".")
--> [258](file:///C:/Users/Yo/Documents/18_Work/PVLIB/pvlib-python/docs/examples/pan_file_reader.py:258) major_version = int(data["Version"][dot_position - 1 : dot_position])
259 if major_version < 6:
260 for i in range(real48_start_index + 170, len(byte_array)):
ValueError: invalid literal for int() with base 10: ''
Whole script, with the imports, to save marginal time
from __future__ import annotations
import struct
from typing import Any
# --- Constants ---
SEMICOLON_MARKER = 0x3B
DOT_MARKER = 0x09
DOUBLE_DOT_MARKER = 0x0A
FORWARD_SLASH_MARKER = 0x2F
CR_MARKER = 0x0D # Carriage Return
VERTICAL_BAR_MARKER = 0xA6
# --- Supporting Functions ---
def _read48_to_float(*, real48: bytes) -> float:
"""
Converts a 6-byte Delphi Real48 encoded value to a standard Python float.
The format consists of:
- 1 byte: Exponent (offset by 129)
- 5 bytes: Mantissa, with the last bit of the 5th byte as the sign bit.
"""
if not real48 or len(real48) != 6 or real48[0] == 0:
return 0.0
# The exponent is the first byte, with an offset of 129
exponent = float(real48[0] - 129)
mantissa = 0.0
# Process the first 4 bytes of the mantissa
# The division by 256 (or multiplication by 0.00390625) shifts the bytes
for i in range(4, 0, -1):
mantissa += real48[i]
mantissa /= 256.0
# Process the 5th byte of the mantissa
mantissa += real48[5] & 0x7F # Use only the first 7 bits
mantissa /= 128.0 # equivalent to * 0.0078125
mantissa += 1.0
# Check the sign bit (the last bit of the 6th byte)
if (real48[5] & 0x80) == 0x80:
mantissa = -mantissa
# Final calculation using the exponent
return mantissa * (2.0**exponent)
def _find_marker_index(
*, marker: int, start_index: int, byte_array: bytes
) -> int:
"""
Finds the index of the first occurrence of a hex marker after a start index.
Returns the index right after the marker.
"""
# bytearray.find is more efficient than a manual loop
found_index = byte_array.find(bytes([marker]), start_index)
if found_index != -1:
return found_index + 1
if found_index is None:
raise ValueError(f"Marker {marker} not found in byte array")
return found_index
def _get_param_index(*, start_index: int, offset_num: int) -> int:
"""Calculates the start index of a Real48 parameter."""
return start_index + 6 * offset_num
def _extract_byte_parameters(
*, byte_array: bytes, start_index: int, num_bytes: int
) -> bytes:
"""
This function extracts bytes that form a single parameter from the original byte array
(contains the bytes from the whole file) into a smaller byte array that it returns.
"""
# Check bounds to avoid index errors
if start_index + num_bytes > len(byte_array):
raise IndexError(
f"Not enough bytes: need {num_bytes} bytes starting at {start_index}"
)
# Extract the specified number of bytes starting at start_index
param_byte_sequence = byte_array[start_index : start_index + num_bytes]
return param_byte_sequence
def read_pan_binary(*, file_content: bytes) -> dict:
"""
Parses a binary .PAN file and returns its contents as a dictionary.
Args:
pan_file_path: The full path to the .PAN file.
Returns:
A dictionary containing the parsed data from the PAN file.
"""
data: dict[str, Any] = {}
byte_array = file_content
if not byte_array:
raise ValueError("File is empty")
# --- Find start indices for string parameters ---
try:
manu_start_index = _find_marker_index(
marker=SEMICOLON_MARKER, start_index=0, byte_array=byte_array
)
panel_start_index = _find_marker_index(
marker=DOT_MARKER, start_index=0, byte_array=byte_array
)
source_start_index = _find_marker_index(
marker=DOT_MARKER,
start_index=panel_start_index,
byte_array=byte_array,
)
version_start_index = _find_marker_index(
marker=DOUBLE_DOT_MARKER,
start_index=source_start_index,
byte_array=byte_array,
)
version_end_index = _find_marker_index(
marker=SEMICOLON_MARKER,
start_index=version_start_index,
byte_array=byte_array,
)
year_start_index = _find_marker_index(
marker=SEMICOLON_MARKER,
start_index=version_end_index,
byte_array=byte_array,
)
technology_start_index = _find_marker_index(
marker=DOUBLE_DOT_MARKER,
start_index=year_start_index,
byte_array=byte_array,
)
cells_in_series_start_index = _find_marker_index(
marker=SEMICOLON_MARKER,
start_index=technology_start_index,
byte_array=byte_array,
)
cells_in_parallel_start_index = _find_marker_index(
marker=SEMICOLON_MARKER,
start_index=cells_in_series_start_index,
byte_array=byte_array,
)
bypass_diodes_start_index = _find_marker_index(
marker=SEMICOLON_MARKER,
start_index=cells_in_parallel_start_index,
byte_array=byte_array,
)
# --- Find start of Real48 encoded data ---
cr_counter = 0
real48_start_index = 0
for i, byte in enumerate(byte_array):
if byte == CR_MARKER:
cr_counter += 1
if cr_counter == 3:
real48_start_index = i + 2 # Skip <CR><LF>
break
if real48_start_index == 0:
return {"error": "Could not find start of Real48 data block."}
# --- Extract string parameters ---
# Note: latin-1 is used as it can decode any byte value without error
data["Manufacturer"] = (
byte_array[manu_start_index : panel_start_index - 1]
.decode("latin-1")
.strip()
)
data["Model"] = (
byte_array[panel_start_index : source_start_index - 1]
.decode("latin-1")
.strip()
)
data["Source"] = (
byte_array[source_start_index : version_start_index - 4]
.decode("latin-1")
.strip()
)
data["Version"] = (
byte_array[version_start_index : version_end_index - 2]
.decode("latin-1")
.replace("Version", "PVsyst")
.strip()
)
data["Year"] = (
byte_array[year_start_index : year_start_index + 4]
.decode("latin-1")
.strip()
)
data["Technology"] = (
byte_array[
technology_start_index : cells_in_series_start_index - 1
]
.decode("latin-1")
.strip()
)
data["Cells_In_Series"] = (
byte_array[
cells_in_series_start_index : cells_in_parallel_start_index - 1
]
.decode("latin-1")
.strip()
)
data["Cells_In_Parallel"] = (
byte_array[
cells_in_parallel_start_index : bypass_diodes_start_index - 1
]
.decode("latin-1")
.strip()
)
# --- Parse Real48 encoded parameters ---
param_map = {
"PNom": 0,
"VMax": 1,
"Tolerance": 2,
"AreaM": 3,
"CellArea": 4,
"GRef": 5,
"TRef": 6,
"Isc": 8,
"muISC": 9,
"Voc": 10,
"muVocSpec": 11,
"Imp": 12,
"Vmp": 13,
"BypassDiodeVoltage": 14,
"RShunt": 17,
"RSerie": 18,
"RShunt_0": 23,
"RShunt_exp": 24,
"muPmp": 25,
}
for name, offset in param_map.items():
start = _get_param_index(
start_index=real48_start_index, offset_num=offset
)
end = start + 6
param_bytes = byte_array[start:end]
value = _read48_to_float(real48=param_bytes)
if name == "Tolerance":
value *= 100 # Convert to percentage
if value > 100:
value = 0.0
data[name] = value
# --- Check for and Parse IAM Profile ---
dot_counter = 0
iam_start_index = 0
dot_position = data["Version"].find(".")
major_version = int(data["Version"][dot_position - 1 : dot_position])
if major_version < 6:
for i in range(real48_start_index + 170, len(byte_array)):
if byte_array[i] == DOT_MARKER:
dot_counter += 1
if dot_counter == 2:
iam_start_index = i + 4
break
if iam_start_index > 0:
data["IAMProfile"] = _extract_byte_parameters(
start_index=iam_start_index, byte_array=byte_array
)
except (IndexError, TypeError, struct.error) as e:
return {"error": f"Failed to parse binary PAN file: {e}"}
return data
if __name__ == "__main__":
# Example usage
with open("sample.PAN", "rb") as f:
file_content = f.read()
pan_data = read_pan_binary(file_content=file_content)
print(pan_data)
Here's what should be the same, but with compatibility for PVsyst < V6.40. Maybe it will work?
Yup, it does work!
{'Manufacturer': 'Hobbs Solar', 'Model': 'WH-100', 'Source': 'W Hobb', 'Version': 'PVsyst 6.3', 'Year': 'Si', 'Technology': 'Si-mono', 'Cells_In_Series': '60', 'Cells_In_Parallel': '1', 'PNom': 100.0, 'VMax': 600.0, 'Tolerance': 0.0, 'AreaM': 1.7000305175770336, 'CellArea': -9984.00000089407, 'GRef': 1000.0, 'TRef': 25.0, 'Isc': 8.642793156497646, 'muISC': 5.029006116412347, 'Voc': 37.400976562465075, 'muVocSpec': -139.21693813055754, 'Imp': 8.14999999999418, 'Vmp': 30.700488281232538, 'BypassDiodeVoltage': -0.6999999999998181, 'RShunt': 300.0, 'RSerie': 0.00997572861830065, 'RShunt_0': 2000.0, 'RShunt_exp': 5.5, 'muPmp': -0.3000076293942584}
Amazing collaboration guys! We now have all the required info to decide on whether this deserves a place in pvlib. I think both yes and no positions are reasonable, I'm slightly in favour of making this favour to the PV community. From a quick search, it does not look illegal to do this: https://law.stackexchange.com/questions/8277/is-it-legal-to-write-software-to-convert-data-from-a-proprietary-format
Feel free to leave your votes @pvlib/pvlib-maintainer , @pvlib/pvlib-triage
@williamhobbs to the rescue once again!
@echedey-ls thank you for testing!
I used PVsyst V6.79 to create the PAN file, and saved it with the "File compatible with old version < V6.40" option selected (see screenshot below). Does this mean the conversion script will not work with PAN files from PVsyst V6.40 through V6.80 (I think that's when they switched to UTF-8)?
I'm not too concerned, since it's better than nothing, but thought it was worth bringing up.
My vote is include it. My primary concerns were 1) testing and 2) fixing it if it stops working. #1 is addressed, and for #2 we can add disclaimer in the docstring that we (pvlib) probably can't fix this if it stops working. It's a niche use so I'm not too worried about it.
it does not look illegal to do this
If the consensus is to add this, I will contact PVsyst and ask if they are OK with this. I'd rather be certain.
Thanks all. Kurt sorry for misleading you. I must have been confusing Frederic’s repo for CASSYS. I agree his PVsyst tools repo only works on newer text files. I did find this repo for converting pascal real48 to Python: https://github.com/eighty6-analytics/real48/blob/master/real48.py but seems like your code is already working. Congrats!
I am neutral on including this.
Could this be made trustworthy? You would certainly need more than one sample file to test on. From my experience with reading and writing various PVsyst format files in python I note that there are often little unexpected quirks. Converting an old-style PAN file for an old PV system would be a one-time effort, so if one wanted to be sure it was done right, it would probably be safest to use PVsyst itself anyway.
A nice companion feature would be some functions to read and write model parameters to a more generic readable format, perhaps using yaml. That way the converted parameters can then be documented easily, and, if necessary, manually corrected.
I think it wouldn't be too hard to test on a variety of sample files - the challenge could be testing on publicly available sample files. I think there are probably several pvlib contributors (maybe maintainers, too) that have access to archives of proprietary binary PAN files. Is "offline" testing worth considering? I think it's probably unnecessary, but worth considering.
Would this file work? sample.zip
I've tested and it seems no... I wonder why.
I used PVsyst V6.79 to create the PAN file, and saved it with the "File compatible with old version < V6.40" option selected (see screenshot below). Does this mean the conversion script will not work with PAN files from PVsyst V6.40 through V6.80 (I think that's when they switched to UTF-8)?
FYI: sample.zip contains a text-based PAN file that can be read with the existing pvlib.iotools.read_pan_ond function. The history seems to be switching to text in 6.40, and then updating the text encoding to UTF-8 in 6.80 (https://github.com/pvlib/pvlib-python/issues/2504#issuecomment-3109334190).