mdanalysis icon indicating copy to clipboard operation
mdanalysis copied to clipboard

Allow LammpsDumpParser Topology reader more flexibility in coordinate column data.

Open hmacdope opened this issue 4 years ago • 21 comments

Is your feature request related to a problem?

The LAMMPS DumpParser topology creator is somewhat inflexible in the column format it expects. Now that we support other coordinate conventions #3360 and velocities #3448 (WIP), we should relax the requirement for an

id, type, x, y, z 

format and make it somewhat more flexible.

Describe the solution you'd like

Allow the parser more flexibility in available column data.

Describe alternatives you've considered

Additional context

hmacdope avatar Oct 31 '21 00:10 hmacdope

@hmacdope I un-assigned you from this issue, otherwise GSoC Contributors will wonder if they can pick the issue to work on.

orbeckst avatar Mar 18 '22 01:03 orbeckst

Okay all good 👍

hmacdope avatar Mar 18 '22 01:03 hmacdope

Hello, I'm RIya Elizabeth John, an Outreachy applicant. Could you provide some more context about the LAMMPS DumpParser topology creator (perhaps it's location in the codebase), and the feature request?

Riyabelle25 avatar Mar 28 '22 08:03 Riyabelle25

The LAMMPs parser is in this file. Currently it expects dump files produced by LAMMPs to have a very specific format, when the file format is actually very flexible and can have many different columns in different orders. The job would be to add a flexible parser to fix this issue.

hmacdope avatar Mar 28 '22 10:03 hmacdope

Apologies for the late response, I had been travelling on our class trip. Adding a parser would mean adding a separate class for the same in a separate file right? Or should I look into adding a specific method to handle this usecase?

Riyabelle25 avatar Apr 03 '22 16:04 Riyabelle25

@Riyabelle25 sorry should have been clearer, you just need to change the logic in the existing class :)

hmacdope avatar Apr 03 '22 21:04 hmacdope

I want to contribute.

sayantan-bhattacharyya avatar Apr 07 '22 15:04 sayantan-bhattacharyya

Hey! I'm interested in contributing to this issue, I would have one question, when talking about: "we should relax the requirement for an id, type, x, y, z format and make it somewhat more flexible." You mean deleting totally/some fields of these lines?: https://github.com/MDAnalysis/mdanalysis/blob/02a644392eb4a8cc3e1da83df9ba5b8bd97d449f/package/MDAnalysis/topology/LAMMPSParser.py#L249 https://github.com/MDAnalysis/mdanalysis/blob/02a644392eb4a8cc3e1da83df9ba5b8bd97d449f/package/MDAnalysis/topology/LAMMPSParser.py#L250 or what exactly is meant by: "more flexible". Thank you so much in advance!

Davichet-e avatar Apr 14 '22 17:04 Davichet-e

Hi! I am an MDAnalysis user and have just started using lammps. I would absolutely use this functionality, especially if it were general (for example, I have been needing to hack around to read forces using MDAnalysis). I don't think I could contribute to changing the parser, but let me know if I can help out in any other way (providing example dump files with different properties, testing things out).

aliehlen avatar Apr 14 '22 18:04 aliehlen

@aliehlen any kind of help is appreciated. If you are a user of LAMMPS then please say how MDAnalysis could work better for you. Issues should focus on one specific thing. You can open new issues or starting a discussion on the developer mailing list. Examples of input files (maybe a snippet in a comment for discussion) and links to format specifications that you know to be relevant are useful. For example, I know very little about LAMMPS (so I also can't say much about https://github.com/MDAnalysis/mdanalysis/issues/3449#issuecomment-1099442709 ) but I can judge an approach if someone clearly explains

  1. what the current situation is (what doesn't work — example)
  2. what they want to achieve (eg an example of what it should look like)
  3. how they want to solve the problem

The last point is not necessary to get a discussion started. Initially, it's more important to understand what the problem is and what a solution should provide.

orbeckst avatar Apr 14 '22 19:04 orbeckst

Hey! I'm interested in contributing to this issue, I would have one question, when talking about: "we should relax the requirement for an id, type, x, y, z format and make it somewhat more flexible." You mean deleting totally/some fields of these lines?: https://github.com/MDAnalysis/mdanalysis/blob/02a644392eb4a8cc3e1da83df9ba5b8bd97d449f/package/MDAnalysis/topology/LAMMPSParser.py#L249 https://github.com/MDAnalysis/mdanalysis/blob/02a644392eb4a8cc3e1da83df9ba5b8bd97d449f/package/MDAnalysis/topology/LAMMPSParser.py#L250

or what exactly is meant by: "more flexible". Thank you so much in advance!

It means being able to parse data under more columns than "Id type ..." and also have them possibly out of order. 😀

hmacdope avatar Apr 14 '22 22:04 hmacdope

Hey! I wanna work on this issue, but I got confused, so I decided to ask a few questions before moving forward:

  • The DATAParser class is limit in reading different styles Atoms section; it only read full and atomic styles. My idea is to modify this class so it can read all different styles as outlined in a table in the Atoms section in read_data documentation. Is this improvement relevant to this issue?
  • Is the goal to improve the LammpsDumpParser class so it can work with a variety of columns? If yes, I am thinking of defining a subset of valid attributes as defined in dump documentation? My rationale to select a subset not the whole set of attributes is to compatible with MDAnalysis; for instance, I am not sure whether attributes like proc (the number of processors) or mux (the orientation of dipole moment of an atom along x axis) can be used in MDAnalysis or not. What are your thoughts?

amirhs1 avatar Apr 16 '22 23:04 amirhs1

@amirhs1 for the DataParser, you can provide the atom_style keyword to match the variety of columns and the LAMMPs output style you chose.

The aim of this issue is to do something similar for the DumpParser. Have a read of the source code in https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/topology/LAMMPSParser.py which already has a lot of the functionality you will need.

hmacdope avatar Apr 17 '22 04:04 hmacdope

I created a branch called "issue3449-LammpsDumpParser" from "develop" branch.

amirhs1 avatar Apr 17 '22 11:04 amirhs1

It will be on your own fork of MDAnalysis. :)

On Sun, 17 Apr 2022 at 9:45 pm, amirhsi @.***> wrote:

I create a branch called "issue3449-LammpsDumpParser" from "develop" branch. I can see it on my side on, but I cannot find it on MDAnalysis github.

— Reply to this email directly, view it on GitHub https://github.com/MDAnalysis/mdanalysis/issues/3449#issuecomment-1100859723, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF3RHC74DR4B6JVS7CGUFA3VFP2W3ANCNFSM5HBUGMCA . You are receiving this because you were mentioned.Message ID: @.***>

-- Hugo MacDermott-Opeskin Post Doctoral Fellow, RSC ANU

hmacdope avatar Apr 18 '22 07:04 hmacdope

Hello @hmacdope, I'm Fortune from the Outreachy program and am interested in contributing to this project. I think I understand the issue but I have just one question about the intended atom_style attribute for the LammpsDumpParser class.

According to the LAMMPS docs for the dump command, these are the possible attributes for atom_style

possible attributes = id, mol, proc, procp1, type, element, mass,
                      x, y, z, xs, ys, zs, xu, yu, zu,
                      xsu, ysu, zsu, ix, iy, iz,
                      vx, vy, vz, fx, fy, fz,
                      q, mux, muy, muz, mu,
                      radius, diameter, omegax, omegay, omegaz,
                      angmomx, angmomy, angmomz, tqx, tqy, tqz,
                      c_ID, c_ID[I], f_ID, f_ID[I], v_name,
                      i_name, d_name, i2_name[I], d2_name[I]

Should all be parsed and passed to their respective TopologyAttrs subclass? The DataParser class seem to clip theirs to id type resid charge x y z

fortune-max avatar Apr 18 '22 18:04 fortune-max

@aliehlen any kind of help is appreciated. If you are a user of LAMMPS then please say how MDAnalysis could work better for you. Issues should focus on one specific thing. You can open new issues or starting a discussion on the developer mailing list. Examples of input files (maybe a snippet in a comment for discussion) and links to format specifications that you know to be relevant are useful. For example, I know very little about LAMMPS (so I also can't say much about #3449 (comment) ) but I can judge an approach if someone clearly explains

1. what the current situation is (what doesn't work — example)

2. what they want to achieve (eg an example of what it _should_ look like)

3. how they want to solve the problem

The last point is not necessary to get a discussion started. Initially, it's more important to understand what the problem is and what a solution should provide.

Sorry for the delay (I was away). I would open an issue about this but it seems that there are a few open already (this one, issue #3504 ) and a few PRs to deal with them (PR #3448 , #3608 , #3647 ). So I'd just like to voice support for this one and #3504 :).

As is discussed here, lammps dump files are very flexible and customizable and it would be great to be able to read in other properties. Even though I'm not an extremely experienced user of lammps or MDAnalysis, I've already run into situations in which I want to read in properties from a dump file that can't currently be read in (I've attached an example of a dump file that has a simple custom style).

Reading through these comments, I think PR #3647 has allowed for the topology parser to accept different atom_styles. I want to point out that it also (seemly) is common to have custom dump formats, so it would be great if those can be handled by the topology parser, as well (or, at least, they shouldn't break the parser). PR #3608 deals with arbitrary columns for reading in coordinate data in a really general way---I'm not sure any of that could apply to the topology parsing, though.

Anyway, I don't know if there's a good way to help with these two specific PRs, but I can test out different files and configurations here if needed!

dump.zip

aliehlen avatar May 03 '22 17:05 aliehlen

Hello, I want to ask a question concerning what you meant on making the lammpsDumpParser more flexible. Does it mean changing the parser to accept different kind of atom styles such as body, bond and charge and not just the atomic style currently used .

yickysan avatar Mar 26 '23 18:03 yickysan

Hi @hmacdope, I received the invitation to submit a PR for the purpose of applying to gsoc'24. I'm interested in contributing to this issue.

Having gone through the discussions above, I understand that I need to allow more flexibility to user in the passing o their dup file. Please is this definition ok? If so I'll:

  1. Extend the atom_style argument: Instead of just a string, allow it to be a dictionary.
  2. Dictionary Structure:
  • Keys: Integers representing column indices (starting from 0).
  • Values: Strings representing the desired attribute names (e.g., "id", "type", "x", "y", "z").
  1. Update _interpret_atom_style function:
  • Modify the function to accept the dictionary as input.
  • Use the dictionary to create a mapping between column indices and attribute names.
  • This mapping can then be used during parsing to interpret data correctly.

If this approach is fine with you, please assign it to me

vdjeudjam avatar Mar 27 '24 04:03 vdjeudjam

Hi @hmacdope, I received the invitation to submit a PR for the purpose of applying to gsoc'24. I'm interested in contributing to this issue.

Having gone through the discussions above, I understand that I need to allow more flexibility to user in the passing o their dup file. Please is this definition ok? If so I'll:

  1. Extend the atom_style argument: Instead of just a string, allow it to be a dictionary.
  2. Dictionary Structure:
  • Keys: Integers representing column indices (starting from 0).
  • Values: Strings representing the desired attribute names (e.g., "id", "type", "x", "y", "z").
  1. Update _interpret_atom_style function:
  • Modify the function to accept the dictionary as input.
  • Use the dictionary to create a mapping between column indices and attribute names.
  • This mapping can then be used during parsing to interpret data correctly.

If this approach is fine with you, please assign it to me

Hi @hmacdope @orbeckst any feedback on this suggestion?

vdjeudjam avatar Mar 29 '24 01:03 vdjeudjam

Hello @vdjeudjam we normally don't assign issues for GSOC or other programs. Just submit a PR that references the issue. We will review the first one.

orbeckst avatar Mar 30 '24 18:03 orbeckst