[v6r7] DFC: rm does not remove files from storage
Using the DFC command line (dirac-dms-filecatalog-cli), removing a file with 'rm' deletes the record from the catalogue, but does not remove replicas from storage.
The documentation points out that 'this method is not fully implemented' so I guess this is more of a feature request than a bug report.. it would be very useful though.
Yes, this is a long standing feature request. The main problem is that this operation ( file removal ) should be asynchronous. The proposal how to implement that you can find here https://github.com/DIRACGrid/DIRAC/wiki/Asynch-FC-operations-rfc . You welcome to comment to this ( as well as to other Requests For Comments RFC ) in the DIRAC project page https://github.com/DIRACGrid/DIRAC/wiki/DIRAC-Requests-For-Comments-%28RFC%29
What's the status of that ?
Why should dirac-dms-filecatalog-cli "rm" remove file in storage... This is a pure catalog script which b.t.w. is very dangerous as are CLI commands like dirac-dms-remove-catalog-replicas!
Yes, I agree with the Philippe's comment. The FC CLI should be rather considered to be an administrator level tool to work with the catalog proper, including removing entries from the catalog without actually checking the physical replicas that "normal" user should never do. To be consistent the CLI interface should then be cleaned from other commands involving physical files: add, get, replicate. For the operations with physical files users should rather use commands, Core or COMDIRAC style.
"rm" should be removed because of the faulty documentation string. I find "get" and "replicate" to be very useful to have inside the CLI, probably even for non-administrators.
Let me just bump this one more time, since it is still valid and I just ran into this problem.
The CLI rm command removes the file from the file catalogue, but does not delete the actual copies on the storage elements. Neither does rmreplica, despite saying so in the help text. I do consider this a bug. The tutorial at https://www.gridpp.ac.uk/userguide/data-on-the-grid/dirac-dfc-cli.html also makes it seem like the two commands do remove the physical copies. Since you cannot actually see what is happening with the physical copies after running the commands, I assume this has lead to quite a few "ghost" files scattered around the storage elements.
The command line tools dirac-dms-remove-files and dirac-dms-remove-replicas work as intended, removing the files from the storage elements as well as from the file catalogue.
As a dirty fix when you accidentally removed a file from the catalogue with the CLI and want to get rid of the physical copies, you can re-upload the file and replicate to the same storage elements again. Afterwards you can remove them with dirac-dms-remove-files.
Hi, which version of DIRAC are you running? Are you using the GridPP installation?
Is it only a matter of upgrading the documentation inside the CLI?
@atsareg @andresailer @chaen who is the maintainer of the FC CLI?
No idea, never looked at it.
I'm using the version on /cvmfs/ganga.cern.ch/dirac_ui.
I think it is not just a matter of updating the CLI help texts (though that would probably help). The way the commands work now, they produce "dark data" which could get a bit messy in the long run. Even if the help text explicitly warns the user about this (which it does not at the moment), people will try these commands within the CLI erroneously (but understandably) assuming they work the same way as the stand-alone script versions. I would suggest to either fix the commands or remove them from the CLI completely.
@atsareg is taking care of the FC CLI I believe.
@atsareg ping
pong