core icon indicating copy to clipboard operation
core copied to clipboard

OpenADAS: Expand ADF15 headers

Open Mateasek opened this issue 4 years ago • 7 comments

Recently I had to add an ADF15 file sent to me from ADAS which has a header the current parser doesn't recognise. I changed the adf15 parsing slightly to make it cover more versions without needing to touch the currently used regexp. PR #297 shows what I did. I tested it by populating a new repository, but I'm not sure the code should look like this. If we have to add more versions in future the code will become quite long. Would that be a problem? We could also define the headers in a separate file. Any opinions?

Mateasek avatar May 31 '21 17:05 Mateasek

Hi @Mateasek, I think the way you defined the headers in #297 is fine. There are not so many PEC versions in ADAS.

It is likely that the same problem may occur with the files for hydrogen and hydrogen-like ions, so I think that pec_index_header should be updated in _scrape_metadata_hydrogen() and _scrape_metadata_hydrogen_like() as well.

Unfortunately, the headers are just the tip of the iceberg. It will be more difficult to make a parser that supports different formats of electronic configuration strings, #256.

vsnever avatar Jun 01 '21 08:06 vsnever

It is likely that the same problem may occur with the files for hydrogen and hydrogen-like ions, so I think that pec_index_header should be updated in _scrape_metadata_hydrogen() and _scrape_metadata_hydrogen_like() as well.

I thought I would leave it to moment when we "discover" a new version of the header with the other methods. The problem is really in the comment section of the data file. I see that this problem is deeper than I expected. I will wait a bit more for somebody else to express their opinions and then I will make a regular PR.

Mateasek avatar Jun 01 '21 08:06 Mateasek

Hi @Mateasek and @vsnever

I think as an interim solution this is quite useful - one runs into this issue when parsing, for example, "adf15/pec96#n/pec96#n_vsu#n3.dat".

I tested PR #297 (now closed?) and it worked well for me. :)

roni-maenpaa avatar Feb 12 '24 16:02 roni-maenpaa

Hi @roni-maenpaa

thanks for the information. I'm happy that it helps, but I think adding more and more regex definitions and cases is not the proper way forward. We will probably have to go some other way, since the problematic part of the data files is assumed by ADAS to be loosely defined comment section.

One idea was to avoid parsing the problematic file part let users define provide the necessary info themselves along providing a set of examples how to parse the most common versions of the headers. Taking care of the parsing inside cherab-openadas is quite tedious and never ending job...

Mateasek avatar Feb 12 '24 18:02 Mateasek

Thanks a lot for the response, @Mateasek! I see that this issue is indeed quite tricky.

I had another question regarding the handling of OpenADAS-data in Cherab which may be related, so I will link the issue I raised here just in case: #426

roni-maenpaa avatar Feb 13 '24 13:02 roni-maenpaa

How about letting the user pass in a custom metadata_parser function object:

def parse_adf15(element, change, adf_file_path, metadata_parser=None):
...

When the value is None make a best-effort guess at a suitable comment block parser based on the ones we have available at the moment (hydrogen, hydrogen-like, full): I would also vote for adding a parser for PEC40 etc. to the default set as it covers quite a lot of useful transitions (virtually all the Argon PECs, for example). These parsers would become public functions rather than private adf15.py functions.

End users then have the option of providing their own custom parser for a particular file format, which removes the burden on Cherab to cover all possible metadata formats and instead restrict ourselves to only common/popular formats. We could then welcome pull requests to add additional parser functions if the community ends up finding them generally useful.

An additional demo showing how to write a parser for an example unsupported format (this could be PEC40 for example) should also be provided: at that point we would have provided all the tools required to use any ADF15 file.

jacklovell avatar May 14 '24 10:05 jacklovell

I would go even further. Lets specify the format of the data the users have to provide to the parsing functions to get the problematic information in (e.g. wavelengths). Then lets have a pool of ready parsers providing the right data format as an example of what the users have to do themselves.

What I mean is lets specify data format on the input and provide some examples of how to get the data.

Mateasek avatar May 20 '24 08:05 Mateasek