dcache icon indicating copy to clipboard operation
dcache copied to clipboard

qos fails with `Attribute is not defined: QOS_POLICY`

Open kofemann opened this issue 1 year ago • 7 comments

Found in log files of prod system.

 java.lang.IllegalStateException: Attribute is not defined: QOS_POLICY
         at org.dcache.vehicles.FileAttributes.guard(FileAttributes.java:335)
         at org.dcache.vehicles.FileAttributes.getQosPolicy(FileAttributes.java:777)
         at org.dcache.qos.services.engine.provider.PolicyBasedQoSProvider.fetchRequirements(PolicyBasedQoSProvider.java:136)
         at org.dcache.qos.services.engine.provider.PolicyBasedQoSProvider.fetchRequirements(PolicyBasedQoSProvider.java:129)
         at org.dcache.qos.local.clients.LocalQoSRequirementsClient.fileQoSRequirementsRequested(LocalQoSRequirementsClient.java:81)
         at org.dcache.qos.services.engine.handler.FileQoSStatusHandler.fileQoSStatusChanged(FileQoSStatusHandler.java:470)
         at org.dcache.qos.services.engine.handler.FileQoSStatusHandler.lambda$handleAddCacheLocation$0(FileQoSStatusHandler.java:195)
         at org.dcache.util.BoundedExecutor$Worker.run(BoundedExecutor.java:247)
         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
         at java.base/java.lang.Thread.run(Thread.java:829)

Casuse:


        FileAttributes attributes = descriptor.getAttributes();
        if (attributes.isDefined(FileAttribute.QOS_POLICY) && attributes.getQosPolicy() == null) {
            /*
             * This is a lazily discovered change, so
             * as a matter of consistency it calls for removal
             * of the pnfsid from the engine's tracking tables.
             */
            engineDao.delete(update.getPnfsId());
            return super.fetchRequirements(update, descriptor);
        }

        return fetchRequirements(update, descriptor);
    }

    @Override
    public FileQoSRequirements fetchRequirements(FileQoSUpdate update, FileQoSRequirements descriptor)
          throws QoSException {
        FileAttributes attributes = descriptor.getAttributes();
        String name = attributes.getQosPolicy();

When policy is not defined, then fetchRequirements is called, which invokes attributes.getQosPolicy() TheString name = attributes.getQosPolicy();` called when

kofemann avatar Oct 11 '24 09:10 kofemann

reviewing

khys95 avatar Nov 19 '24 11:11 khys95

@kofemann did you mean to finish that sentence? Otherwise, I dont think it makes sense

khys95 avatar Nov 19 '24 14:11 khys95

Close by 8b9dfb399767db4c4483ff107a24eed47f633a61

khys95 avatar Dec 06 '24 11:12 khys95

Though the original error is not there, the issue still pops up:

29 Jan 2025 14:46:00 (qos-engine) [] Thread Thread[pool-357-thread-1,5,qos-engine-threads] died
java.lang.NullPointerException: null
	at java.base/java.util.Objects.requireNonNull(Objects.java:209)
	at java.base/java.util.Optional.of(Optional.java:113)
	at org.dcache.vehicles.FileAttributes.toOptional(FileAttributes.java:834)
	at org.dcache.vehicles.FileAttributes.getQosPolicyIfPresent(FileAttributes.java:773)
	at org.dcache.qos.services.engine.provider.PolicyBasedQoSProvider.fetchRequirements(PolicyBasedQoSProvider.java:124)
	at org.dcache.qos.services.engine.provider.PolicyBasedQoSProvider.fetchRequirements(PolicyBasedQoSProvider.java:116)
	at org.dcache.qos.local.clients.LocalQoSRequirementsClient.fileQoSRequirementsRequested(LocalQoSRequirementsClient.java:81)
	at org.dcache.qos.services.engine.handler.FileQoSStatusHandler.fileQoSStatusChanged(FileQoSStatusHandler.java:470)
	at org.dcache.qos.services.engine.handler.FileQoSStatusHandler.lambda$handleAddCacheLocation$0(FileQoSStatusHandler.java:195)
	at org.dcache.util.BoundedExecutor$Worker.run(BoundedExecutor.java:247)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)

kofemann avatar Jan 29 '25 13:01 kofemann

Hello @kofemann as reported in todays Tier 1 meeting, I am observing similar error. The dcache version is dcache-9.2.32-1.noarch, JAVA version Red_Hat-17.0.13.0.11-1.

in the

qos-engine:
    29 Jan 2025 09:05:59 [pool-7-thread-294] [] Uncaught exception in thread pool-7-thread-294java.lang.NullPointerException: null
    	at java.base/java.util.Objects.requireNonNull(Objects.java:209)
    	at java.base/java.util.Optional.of(Optional.java:113)
    	at org.dcache.vehicles.FileAttributes.toOptional(FileAttributes.java:834)
    	at org.dcache.vehicles.FileAttributes.getQosPolicyIfPresent(FileAttributes.java:773)
    	at org.dcache.qos.services.engine.provider.PolicyBasedQoSProvider.fetchRequirements(PolicyBasedQoSProvider.java:127)
    	at org.dcache.qos.services.engine.provider.PolicyBasedQoSProvider.fetchRequirements(PolicyBasedQoSProvider.java:119)
    	at org.dcache.qos.local.clients.LocalQoSRequirementsClient.fileQoSRequirementsRequested(LocalQoSRequirementsClient.java:81)
    	at org.dcache.qos.services.engine.handler.FileQoSStatusHandler.fileQoSStatusChanged(FileQoSStatusHandler.java:470)
    	at org.dcache.qos.services.engine.handler.FileQoSStatusHandler.lambda$handleAddCacheLocation$0(FileQoSStatusHandler.java:195)
    	at org.dcache.util.BoundedExecutor$Worker.run(BoundedExecutor.java:247)
    	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    	at java.base/java.lang.Thread.run(Thread.java:840)
    

And qos-verifier

(qos-verifier@dcdncore05qosDomain) admin > verify history
2025/01/29 07:42:14 (0000E0C85F1E79A144609C00F07E63854477 POOL_STATUS_UP)(last adjustment: VOID)(parent dcdn007_1, retried 0) CacheException: Processing for 0000E0C85F1E79A144609C00F07E63854477 failed during verify. NullPointerException: null
2025/01/29 07:42:14 (000068DE798C227E41FABB771202AD93BD9C POOL_STATUS_UP)(last adjustment: VOID)(parent dcdn007_1, retried 0) CacheException: Processing for 000068DE798C227E41FABB771202AD93BD9C failed during verify. NullPointerException: null
 

I do not see this on the integration instance on Java 17 and dcache-9.2.20-1.noarch

Hope this information helps. Carlos

cfgamboa avatar Jan 29 '25 16:01 cfgamboa

The problem here is that null is a valid value for FileAttribute.QOS_POLICY. Writing attributes.setQosPolicy(null); is valid. However, null is not a valid value to be encapsulated within Optional<String>.

Philosophically, there are two kinds of "unknown value" for QOS_POLICY: the value is simply unknown (FileAttributes#setQosPolicy has not been called) or it is known that the file doesn't have a policy (FileAttributes#setQosPolicy has been called, with a null argument).

Therefore, the method FileAttributes.getQosPolicyIfPresent cannot work return an Optional<String>.

There are three solutions (I can think of):

  1. the two unknowns (above) are combined. The getQosPolicyIfPresent method is updated so it returns Optional.empty() if FileAttributes#setQosPolicy has not been called or if FileAttributes#setQosPolicy was called with a null argument.
  2. The signature is updated to return Optional<Optional<String>>. The getQosPolicyIfPresent method returns Optional.empty() if FileAttributes#setQosPolicy has not been called. It returns Optional.of(Optional.empty()) if FileAttributes#setQosPolicy was called with a null argument, otherwise it returns Optional.of(Optional.of(policy)) if FileAttributes#setQosPolicy was called with the non-null argument policy.
  3. Remove null as a valid value for QoS policy. Instead, there would be a standard/place-holder value used instead. For example, the string DEFAULT.

I'd find 1. dangerous, as missing information is treated as if the file has the default policy.

In the short-term, 2. might be the best approach, but perhaps 3. is worth considering.

paulmillar avatar Jan 30 '25 19:01 paulmillar

Reviewing

khys95 avatar Feb 07 '25 10:02 khys95

@khys95 this is fixed now, isn't it? Can we close the issue?

kofemann avatar Jul 22 '25 07:07 kofemann