chemiscope icon indicating copy to clipboard operation
chemiscope copied to clipboard

Make creating chemiscope input easier

Open Luthaf opened this issue 4 years ago • 2 comments

While #96 is a first step toward this, there are still multiple places where the process could be smoother.

I think ideally we should keep the dual workflow with a function that is mirrored by a command-line utility. People from "my generation" still have an instinct to go full bash onto postprocessing. So I think we want to be able to easily combine structures (here having something that can be read by ASE or an Atoms list seem to cover quite some grounds) and arrays of values (that maps easily into column files) or dicts. One thing that often bugs me is that I want to drop info from the ASE file so there could also be a switch that allows you to drop those fields.

Originally posted by @ceriottm in https://github.com/cosmo-epfl/chemiscope/issues/96#issuecomment-729646368


There are three main parts to a chemiscope input file: metadata, properties and structures.

The story to import structures into chemiscope is already pretty good, as long as you work with ASE =). Adding support for alternative file formats should be relatively easy and can be done on a case-by-case basis.

Properties is the harder part right now. We take the properties defined by ase in Atoms.info and Atoms.arrays, but the user may not want this (e.g. the number property), and may want more properties. For now, the only way to add other properties is to manually create the right dictionary and pass it to the function. Removing properties is also possible within python with del frame.info["whatever"] or del frame.arrays["whatever"], but not with the command line script.

Finally, the script support basic metadata input, but again it is much easier to do this with the Python function.


One thing we can do is add support for properties stored in CSC/text/npy files. For CSV files the property name would be the CSV header, for the other methods we could just name properties 1, 2, 3, etc. We could easily guess the target (atom/structure) by counting the number of values in the property.

This will obviously not support any property metadata (description/units), but for quick & dirty command line scripting, or to separate analysis/chemiscope generation it could help.

Luthaf avatar Nov 18 '20 16:11 Luthaf

One idea that I had while typing this wall of text would be to add an online, graphical input editor to chemiscope, either as part of the default visualizer, or as a separate page/piece of code. The editor could start by simply allowing the user to change the dataset or properties metadata as a way to finalize a JSON created using chemiscope-input.

On a longer term, I could see something where you can drag & drop your structure file, read it directly in the browser with https://github.com/chemfiles/chemfiles.js/, drop a few analysis results in CSV/TXT/NPY/etc. format for the properties, manually add metadata and pick a default representation. This code would still not do any kind of analysis, but could solve a lot of pain points with getting people to start using chemiscope.

Luthaf avatar Nov 18 '20 16:11 Luthaf

Potential API improvement for chemiscope.create_input:

from chemiscope import create_input

data = create_input(
    frames = frames,
    properties = {
        "PCA": np.array(),  # automatically guess target
        "something else": {"target": "atom", "values": XXX, "units": "bar/foo"},  # or allow to specify it
    },
    atom_properties_from_frames = ["name", "other name"],
    structure_properties_from_frames = ["something else"],
    meta={},  # as now
    cutoff=4.5,  # as now
)

Properties from the frame are only loaded if they are part of atom_properties_from_frames or structure_properties_from_frames. These new parameters can also be specified on the command line.

Luthaf avatar Nov 20 '20 14:11 Luthaf

I dare to say that with the jupyter integration creating cs inputs is now fairly easy. One can also use the widgets to tweak the graphical appearance and then run widget.save(filename) to dump the datafile with the chosen settings. Editing the properties is not yet supported but I think it is now easy enough to set things up in a python or jupyter script and I see little use for the very time consuming task of making a full-fledged GUI. I will close this, but if someone feels that a standalone GUI (as opposed to the widget) is needed, I suggest they open a new issue explaining what is badly needed on top of the many features that have been implemented already.

ceriottm avatar Dec 01 '23 00:12 ceriottm