hop icon indicating copy to clipboard operation
hop copied to clipboard

[Bug]: Dimension lookup/update transform "Unable to retrieve data type of return fields because of an unexpected error"

Open matze42 opened this issue 1 year ago • 9 comments

Apache Hop version?

2.7

Java version?

openjdk 11.0.21 2023-10-17

Operating system

macOS

What happened?

Running a pipeline with the dimension lookup/updated transform results in an error and exception. The error message says that it could not retrieve the data type of the return fields although that is specified.

org.apache.hop.core.exception.HopTransformException: Unable to retrieve data type of return fields because of an unexpected error at java.lang.Thread.run (Thread.java:829) at org.apache.hop.pipeline.transform.RunThread.run (RunThread.java:55) at org.apache.hop.pipeline.transforms.dimensionlookup.DimensionLookup.processRow (DimensionLookup.java:99) at org.apache.hop.pipeline.transforms.dimensionlookup.DimensionLookupMeta.getFields (DimensionLookupMeta.java:286)

at org.apache.hop.pipeline.transforms.dimensionlookup.DimensionLookupMeta.getFields(DimensionLookupMeta.java:307)
at org.apache.hop.pipeline.transforms.dimensionlookup.DimensionLookup.processRow(DimensionLookup.java:99)
at org.apache.hop.pipeline.transform.RunThread.run(RunThread.java:55)
at java.base/java.lang.Thread.run(Thread.java:829)

Caused by: java.lang.NullPointerException at org.apache.hop.pipeline.transforms.dimensionlookup.DimensionLookupMeta.getFields(DimensionLookupMeta.java:286) ... 3 more 2024/01/12 13:39:24 - Lookup Building block.0 - ERROR: Unexpected error 2024/01/12 13:39:24 - Lookup Building block.0 - ERROR: org.apache.hop.core.exception.HopTransformException: 2024/01/12 13:39:24 - Lookup Building block.0 - Unable to retrieve data type of return fields because of an unexpected error 2024/01/12 13:39:24 - Lookup Building block.0 - at java.lang.Thread.run (Thread.java:829) 2024/01/12 13:39:24 - Lookup Building block.0 - at org.apache.hop.pipeline.transform.RunThread.run (RunThread.java:55) 2024/01/12 13:39:24 - Lookup Building block.0 - at org.apache.hop.pipeline.transforms.dimensionlookup.DimensionLookup.processRow (DimensionLookup.java:99) 2024/01/12 13:39:24 - Lookup Building block.0 - at org.apache.hop.pipeline.transforms.dimensionlookup.DimensionLookupMeta.getFields (DimensionLookupMeta.java:286) 2024/01/12 13:39:24 - Lookup Building block.0 - 2024/01/12 13:39:24 - Lookup Building block.0 - at org.apache.hop.pipeline.transforms.dimensionlookup.DimensionLookupMeta.getFields(DimensionLookupMeta.java:307) 2024/01/12 13:39:24 - Lookup Building block.0 - at org.apache.hop.pipeline.transforms.dimensionlookup.DimensionLookup.processRow(DimensionLookup.java:99) 2024/01/12 13:39:24 - Lookup Building block.0 - at org.apache.hop.pipeline.transform.RunThread.run(RunThread.java:55) 2024/01/12 13:39:24 - Lookup Building block.0 - at java.base/java.lang.Thread.run(Thread.java:829) 2024/01/12 13:39:24 - Lookup Building block.0 - Caused by: java.lang.NullPointerException 2024/01/12 13:39:24 - Lookup Building block.0 - at org.apache.hop.pipeline.transforms.dimensionlookup.DimensionLookupMeta.getFields(DimensionLookupMeta.java:286) 2024/01/12 13:39:24 - Lookup Building block.0 - ... 3 more 2024/01/12 13:39:24 - Lookup Building block.0 - child index = 3, logging object : org.apache.hop.core.logging.LoggingObject@3c86d77f parent=63896a50-b6ee-42c2-847a-e90562bac780

image

If helpful I am happy to provide more information

Issue Priority

Priority: 2

Issue Component

Component: Transforms

matze42 avatar Jan 15 '24 08:01 matze42

Currently I do the following work around: Open pipeline/transform in HOP. Change the type of return field to something else. Save. Change back to correct value. Save. Run in hop -> works - Strange

matze42 avatar Jan 15 '24 08:01 matze42

On which field is the return type incorrect? Is this a date/datetime field?

hansva avatar Jan 15 '24 14:01 hansva

in the example above it is only one return field (see screenshot) and the type of that field is a String in Hop and a varchar(255) in the DB (Azure SQL Server).

In other pipelines I have the same issue with fields that I declared as Integer in HOP and which are int in the DB.

matze42 avatar Jan 16 '24 06:01 matze42

There may be a bug if the pipeline was created with a version < 2.7. When you return to the correct value, does the problem reappear in version 2.7?

nadment avatar Jan 18 '24 20:01 nadment

I have the same problem. Here some more data in case it helps: If I do the work around described by matze42, it works but if execute a "Clear database cache" it fails again, exactly the same way.

I have tried with HOP 2.7.0 and HOP 2.5.0 executing (after tuning) workflows imported from kettle and 2.7.0 native workflows: same behaviour.

mbgracia avatar Jan 22 '24 11:01 mbgracia

There may be a bug if the pipeline was created with a version < 2.7. When you return to the correct value, does the problem reappear in version 2.7?

Not 100% sure I get what you ask for. The pipeline was created before 2.7. After applying the workaround it works. Next month (because I do a monthly loading cycle) the problem is there again. In other words; the problem reoccurs even if I stay within 2.7.

matze42 avatar Jan 22 '24 12:01 matze42

I have the same problem. Here some more data in case it helps: If I do the work around described by matze42, it works but if execute a "Clear database cache" it fails again, exactly the same way.

Following your input I can confirm that clearing the database cache triggers the error for me as well. Sequence: pipeline works -> clear database cache -> pipeline does not work

matze42 avatar Jan 22 '24 17:01 matze42

I rescue this issue to see if it can be moved to priority 1. In my case, we have more than 100 transformations that we have imported from kettle to load our DWH. We have done semi-automatic and manual corrections after import, but this issue is blocking to migrate the load from kettle to HOP and start with testing in the pre-production environment. I am available to give more information if needed. Thanks a lot.

Belén

mbgracia avatar Feb 20 '24 08:02 mbgracia

I suspect a problem loading/saving transformation metadata when changing the "Update dimension" mode. In the "Fields" tab, we switch from "Type of return field" to "Type of dimension" in the same column without adapting the values.

Step to reproduce:

  1. Disable "Update dimension"
  2. Create a field with a type of return field STRING
  3. Enable "Update dimension"
  4. Save the pipeline
  5. Reopen the pipeline

There are 3 points to consider:

  1. load/save metadata problem when changing dimension update
  2. if the return data type in the database is not recognized in lookup mode (community help needed).
  3. unnecessary "Type of return field" information never used in the code, which should not be encoded by user but comes from the database

nadment avatar Feb 20 '24 20:02 nadment