hyperion
hyperion copied to clipboard
CTCE links dying with VM/Passthrough (PVM)
Hi folks,
I've ran into some hot water with a recent breaking change on Hercules, around the 4.6 mark; this change has seemingly broken PVM. I have tested the following configurations of PVM and found that SOME work:
PVM 2.1 (1993):
- VM/SP 5
- VM/SP 6
- VM/HPO 4.2
- VM/HPO 5
- VM/ESA 1.1.1 (370)
- VM/ESA 1.2.2
- VM/ESA 2.1.0
PVM 2.1 (1998):
- VM/ESA 2.4.0
- z/VM 4.2
- z/VM 4.4
- z/VM 5.3
- z/VM 6.2
- z/VM 6.3
- z/VM 6.4
- z/VM 7.1
Specifically, VM/ESA 2.4 (my hub node) can talk to any not-XA+ VM (so, any 370-type VM). This behavior seems to be correct -- I can have any non-XA version of VM talking to any other non-XA version of VM or an XA version of VM, but, two XA versions of VM cannot talk to each other. There are no protocol differences between the different versions of PVM I used -- I only used different versions to try to gain more "period-accurateness" since I am somewhat lacking in different versions of PVM (I only have 5 versions, 2 of which cannot talk to the other 3). I think this may be related to this issue here: https://github.com/SDL-Hercules-390/hyperion/issues/640 I recall being able to revive the links in the past by recreating the devices, but, it was not a permanent fix.
Version info:
HHC01413I Hercules version 4.7.0.11119-SDL-gf7d2360a HHC01414I (C) Copyright 1999-2024 by Roger Bowler, Jan Jaeger, and others HHC01417I ** The SDL 4.x Hyperion version of Hercules ** HHC01415I Build date: Jul 2 2024 at 15:01:43 HHC01417I Built with: GCC 13.2.1 20230801 HHC01417I Build type: GNU/Linux x86_64 host architecture build HHC01417I Running on: server1 (Linux-6.6.8 x86_64) MP=32 HHC01417I Built with crypto external package version 1.0.0.52-ga5096e5 HHC01417I Built with decNumber external package version 3.68.0.102-g3aa2f45 HHC01417I Built with SoftFloat external package version 3.5.0.105-g4b0c326 HHC01417I Built with telnet external package version 1.0.0.63-g729f0b6
The link devices are defined as such, for example:
# VM/ESA 2.4
0441 CTCE 3501 127.0.0.1 3502
# z/VM 6.2
0441 CTCE 3502 127.0.0.1 3501
The device was initialized with CP SET RDEVICE 441 TYPE CTCA beforehand, though the autosense detects the correct device type.
@HackerSmacker: Have you tried using the 4.8 'develop' branch of Hercules yet? Does the problem exist there too? Or does it only fail with version 4.7? Some minor(?) changes where made to CTCE logic since 4.7 was released that only exist in version 4.8-DEV, so you might want to give 4.8 a try.
If 4.8 still fails the same way, then we'll obviously have to dig into your issue a little deeper.
Thanks.
Also, a SIE fix was recently made to 4.8-DEV too (which fixed a problem with VM/ESA 2.4), which might also impact what you're doing, so again, please give our 4.8 'develop' branch a try and let us know whether it works any better or not. Thanks.
@HackerSmacker : Issue #640 is indeed the latest CTCE fix that may be helpful to you. I suggest you to build 4.8 development branch commit a291e7e9 (or later) and try that. If it still does not work, both Hercules logs would be needed to try researching the problem. Thanks.
Awesome, I'll give it a roll soon. I've got a few things to hammer out and test along with a VTAM CTCA timeout issue (this only happens with VTAM 3.3 on VM/SP or VSE/SP). I'll return with some test results in a few hours!
I've compiled it, and, it's running a few different versions of VM. I'll chime back in tomorrow with test results for PVM, VTAM, RSCS, and TSAF (whether or not the links die). I'm testing VM/ESA 1.1, 1.2, 2.1, 2.4, z/VM 4.4, 5.3, and 6.4.
Okay, I've let it run for about a day, and, RSCS/PVM/VTAM are rock-solid (so far, this might change later), but, TSAF (at least, on z/VM 4.4 and VM/ESA 2.4) still shows no hope. I've read through https://github.com/SDL-Hercules-390/hyperion/issues/640 but I'm still getting that dreaded SET_370_MODE error:
02:26:39 ATSL1Y795I Retry limit exceeded on unit 0E50 SET_370_MODE
02:26:39 ATSL1Y708E An attempt to reset link 0E50 has failed
02:26:39 ATSMRX520I Synchronization is now NORMAL
The Herc console (with ctc debug on e50) reports the following:
HHC05079I 0:0E50 CTCE: -> 0:0E51 #0011 cmd=RST=00 xy=aa->Aa l=0000 k=0F500510 w=0,r=0 SENSE=4100 CLEAR
HHC05079I 0:0E50 CTCE: -> 0:0E51 #0012 cmd=RST=00 xy=aa->aa l=0000 k=0F500513 w=0,r=0 SENSE=4100 HALT
HHC05079I 0:0E50 CTCE: -> 0:0E51 #0013 cmd=NOP=03 xy=aa->Aa l=0001 k=0F510411 Stat=0C CC=0 w=0,r=0 SENSE=4100
HHC05079I 0:0E50 CTCE: -> 0:0E51 #0014 cmd=NOP=03 xy=aa->aa l=0001 k=0F510416 Stat=0C CC=0 w=0,r=0 SENSE=4100
HHC05079I 0:0E50 CTCE: -> 0:0E51 #0015 cmd=SEM=C3 xy=an->an l=0001 k=0F5104D7 Stat=0C CC=0 w=0,r=0 SENSE=4100
HHC05079I 0:0E50 CTCE: -> 0:0E51 #0016 cmd=WRT=01 xy=an->an l=03FC k=A8A1E217 Stat=02 CC=1 w=0,r=0 SENSE=4100
HHC05079I 0:0E50 CTCE: -> 0:0E51 #0017 cmd=WRT=01 xy=an->an l=0034 k=26376CD9 Stat=02 CC=1 w=0,r=0 SENSE=4100
The other side (VM/ESA 2.4) has the same behavior... I'll continue to look into it; I got interrupted with a 4-hour-gap as I was configuring it, and, as such, I do not recall if I ever saw the link go up.
Alrighty... I'm a few days in and there haven't been any issues at all. That fix definitely did something, but, I'm still at a loss for TSAF; it's definitely user-error on my end though, I suspect.
HackerSmacker, (@HackerSmacker)
I was just reading through this issue for the first time and I saw your mention above about VM/SP with VTAM 3.3 and CTC timeouts. This issue can be worked around by using the ATTNDELAY option on the CTCE definition in Hercules. I've had excellent results with ATTNDELAY 200. For example:
0600 CTCE 30880 192.168.1.11 30880 ATTNDELAY 200
ATTNDELAY is only needed on the CTCE definitions for VM/SP use; it isnt needed on the other end of the CTCE unless it is also VM/SP.
Regards, Bob
Issue presumed to be resolved. Closing due to inactivity.