aldbr

Results 19 issues of aldbr

We've encountered an issue where some access tokens are occasionally rejected by the Computing Elements (CEs). The primary error observed on the CE side is shown below: ``` [2024-01-29 10:43:53]...

Last item of https://github.com/DIRACGrid/DIRAC/issues/7459 BEGINRELEASENOTES *WorkloadManagement CHANGE: new job management mechanism in the PushJobAgent ENDRELEASENOTES

The objective is to enhance the exploitation of HPCs with no external connectivity in DIRAC. The current workflow is limited: - The `PushJobAgent` only works if you use the `dirac-jobexec`...

Currently, there is a distinct separation in our workflow between marking a job as `RESCHEDULED` and performing the rescheduling action itself. This means if one process encounters an error, the...

The diversity and complexity of Computing Elements (CEs) in DIRAC have significantly increased, leading to a somewhat disorganized system. We have observed that the current structure, which broadly categorizes CEs...

As explained in https://github.com/DIRACGrid/DIRAC/pull/7025, the ` _ssh_call()` method does not seem to work perfectly. There exist a few popular python libraries to perform SSH operations like `Fabric` (Paramiko) (https://docs.fabfile.org/en/stable/). We...

The Site Director is responsible for generating and submitting pilot wrappers to various Computing Element communication interfaces, and deleting them afterwards. CE interfaces may modify pilot wrappers - e.g. to...

Currently, the [Watchdog](https://github.com/DIRACGrid/DIRAC/blob/integration/src/DIRAC/WorkloadManagementSystem/JobWrapper/Watchdog.py) seems to compute the "[time left](https://github.com/DIRACGrid/DIRAC/blob/integration/src/DIRAC/WorkloadManagementSystem/JobWrapper/Watchdog.py#L792)" based on the CPU work, which is the product of the CPUtime that we get from the underlying batch system, which...

Here is a potential issue I discovered running pilots on a SLURM batch system which could batch systems based on wallclock time (when cpu time left depends on real time...

Replace the Dirac-specific `SSH` class by `fabric`. BEGINRELEASENOTES *Resources CHANGE: Replace SSH by fabric in SSHComputingElement ENDRELEASENOTES