aiida-core
aiida-core copied to clipboard
`CalcJob`: add support to `retrieve_list` to use globbing that maintain source file hierarchy
The current syntax for the retrieve_list
supports globbing in the tuple variant, however, it forces the user to specify a depth
, which indicates the levels of nesting to keep. Often though, the user simply wants to maintain the file hierarchy of the remote and not want to remove any level of nesting. This is not possible with the current syntax.
Imagine the following file hierarchy in the remote working directory:
├─ sub_a
│ ├─ vasprun.xml
│ └─ CHGCAR
├─ sub_b
│ ├─ some_file.xml
│ └─ CHGCAR
.
.
There should be a syntax that allows to retrieve all XML files while maintaining the folder hierarchy. So we want to end up with:
├─ sub_a
│ └─ vasprun.xml
├─ sub_b
│ └─ some_file.xml
.
.
I propose we allow the depth
to be set to None
which would accomplish the above use-case, i.e.:
retrieve_list = [('*/*.xml', '.', None)]
Unfortunately, we probably would have to keep the second and third element. At best we could reduce it to a two-element tuple
retrieve_list = [('*/*.xml', '.')]
and the None
would be implied.
@astamminger @ltalirz @giovannipizzi
I am in favor.
Perhaps others can comment on whether None
is self-explanatory here, or whether a more expressive label like MIRROR
or FULL
would make the list specification more intuitive to read.
I personally interpret depth
like a feature. Coming from that perspective, FULL
complements this feature while None
would mean hey, please disable this feature! So, basically both of them make sense to me.
However, if I would have to decide, I would opt for None
because, you know, KISS :)
Sounds good to me then
Can you please comment on what would be the behaviour if one puts depth to zero instead? I admit it's not clear to me (I didn't check the code sorry), but anyway this is useful to clarify as we need to write in the docs
If you have the hierarchy
├─ sub_a
│ └─ vasprun1.xml
│ └─ vasprun2.xml
│ └─ vasprun3.xml
.
and you specify
retrieve_list = [('sub_a/vasprun*.xml', '.', 0)]
it would retrieve all *xml
files but without any leading folders:
├─ vasprun1.xml
├─ vasprun2.xml
└─ vasprun3.xml
So 0
will strip all leading nested folders.
Ah ok, so if I understand it's the level "to be kept" so if I put 1, I get the same as None
in the example above? (and the issue is that we don't need what the correct depth should be?).
Or equivalently, if I have sub_a/sub_b/file.xml
I get only file.xml
with 0, and sub_b/file.xml
with 1?
If this is correct, then I see the need for None
. I was thinking the depth was instead the number of folders to remove from the front, so 0 was equivalent to None
, 1 to strip the parent folder, 2 to strip the two parent folders etc (but I think this is not what is happening?)
Indeed, the depth
specifies the number of levels to keep as you say, and so the problem is that currently you need to know the depth is to keep everything, whereas that is not always known a-priori