DUMP reports: DATA ON TAPE TOO SHORT
I created an ITS dump tape -- it was a full dump of an ITS system.
On another ITS system, I attempted a restore of the tape. I got the following error:
DATA ON TAPE TOO SHORT 3111 MITS.S CONFIG 850 LENGTH ERROR TAPE IN ..FILE BEING SKIPPED..
Further along on the tape I get:
DATA ON TAPE TOO SHORT 4136 SYSBIN MIDAS BIN LENGTH ERROR TAPE IN ..FILE BEING SKIPPED..
and
DATA ON TAPE TOO SHORT 4144 SYSBIN PROBE BIN LENGTH ERROR TAPE IN ..FILE BEING SKIPPED..
I have never seen this. But then, I have never loaded a tape from the full dump.
An easy check would be to see what itstar says about those files.
A more difficult check would be to change the tape controller. I.e. if you have TM10B now, use TM10A instead, or vice versa.
I ran "itstar tvf" on the fulldump.tape that I had created and it reported no errors. I did, specifically look at the output around the tapes that ITS said where the data was too short. However, when I did an "itstar xvf" to extract all the files from the tape, I noticed that the extract was far short of all the files that I had dumped. In fact, none of the files where ITS reported the "DATA ON TAPE TOO SHORT" were extracted (perhaps indicating that the tape load aborted, with no errors reported).
As an example, "itstar tvf" included:
...
MITS.S;.FILE. (DIR) 2019-6-26
MITS.S;-READ- -THIS- 2019-6-14
MITS.S;3COM 3 2019-6-14
MITS.S;BOOT11 1 2019-6-14
MITS.S;CAMAC 1 2019-6-14
MITS.S;CH11 14 2019-6-14
MITS.S;CHANNL 4 2019-6-14
MITS.S;CHSNCP 49 2019-6-14
MITS.S;COMPRO 96 2019-6-14
...
And when I did "itstar xvf", the "mits.s" directory was not extracted. Of course, lots of other directories were not extracted either.
I even tried the itstar from lars/skipfile and it has the same issues.
Since DUMP on ITS and itstar on host appear to have issues with this full dump tape, and since I created this tape on a TM10A io bus tape controller, and since I don't believe we've used TM10A except in the not-much-used KL build (sorry, rcornwell, this was Lars' name/choice and not mine), I wonder if there is an issue with pdp10-ka's TM10A support, or in ITS' support for that tape controller? (I suspect the latter is not true, since MC used this tape controller, I think).
I will perform two tests:
- I'll recreate the same tape (full dump of KL) using TM10A controller and re-test to see if the same thing occurs, or if the same thing occurs, but in different places.
- I'll create an EX ITS with TM10B controller and see whether that has issues (or not).
I'm glad itstar wasn't able to extract all files, because that narrows down the problem. But I'm surprised that the file listing was OK. Did extraction skip entire directories, or what?
The "DATA ON TAPE TOO SHORT" message is just in one place in DUMP: below the MAGRDN label. The complaint seems to be that the tape file header says the file is longer than the data on tape.
itstar doesn't care about the file length; it doesn't write it to the file header, and ignores it when reading tapes. This doesn't explain why directories are skipped.
I haven't completed the experiments yet. But I did notice something: I did another full dump of essentially the same file system (some changes, though). And then I used DUMP's LIST command to list the entire tape. Of course, this took forever, but I noticed that in the MITS.S and SYSBIN directores were DUMP OUTPUT. This suggests that in those directories, DUMP didn't finish writing at least one file in those directories.
The file MITS.S; CONFIG 850 still has its "not backed up" flag on, and is 18 records long. The DUMP OUTPUT file is 15 records long (and also "not backed up". My guess is that the behavior I saw earlier (on another ITS system) is going to be repeated with this new full dump -- and with the same files.
Interesting. This time, it would appear that in SYSBIN; it was PROBE BIN (same as before), MIDAS BIN (same as before) and TELSER BIN (new) that failed to get dumped properly by DUMP. These three files ended up with their "not dumped" flags on.
And when I tried to extract the files from the tape using itstar (master branch version), it didn't extract many, many directories (same as before), but this time gave this error: "?Tape record too short" and aborted. And when I used the itstar from lars/skipfile branch, it did not give this error and appears to have extracted more files/directories. Not sure if the files marked as not dumped on the source ITS system made it to the extracted host directories correctly. And there are still many directories that didn't get extracted. For example, the entire "sysbin;" directory didn't get extracted. (Although mits.s;CONFIG 850 did get extracted).
It looks like NONE of the directories after SYSBIN on the tape got extracted.
Thanks. It seems clear the major problem is on the dumping end, not extraction.
I tried itstar tvf on two recently built output.tapes, and I didn't see any error. I think they're both dumped with TM10B, so maybe this points to a TM10A problem.
Except that I think I created a second full dump from a TM10B system (I have two ITS BIN files for EX -- one TM10A and the other TM10B) and it had the same problems with the lars/skipfile version of itstar.
I'll do the whole experiment again (dumping on TM10A and TM10B systems) with the master branch itstar (should I use the master branch of the itstar repo or the master branch submodule from the its repo)?
Either should be fine, there are no changes to the way files are extracted. But I'd say use itstar's master.
@rcornwell, maybe the delay value we decided on was too small after all?
Eric, FYI there's a delay currectly at 420 near the end of mt_srv in the simulator file PDP10/kx10_mt.c. Rich and I tried to find the lowest value that didn't trigger any errors in the moby DUMP extract in the beginning of the ITS build.
This is weird. I did a full dump with EX ITS, first with TM10A and second with TM10B. The full dump tapes were called fdtm10a.tape and fdtm10b.tape.
Here's an attempt with the master branch itstar (from the itstar repo) to extract the files from the TM10A tape:
eswenson@localhost:~/ex-its-2/fdtm10a$ itstar xvf ../fdtm10a.tape
-PICS-;M.F.D. (FILE) => -pics-/m_f_d_.(file) [OK]
;190804 ! => /190804.~~~~~! No such file or directory
eswenson@localhost:~/ex-its-2/fdtm10a$
And here's an attempt to extract the files from the TM10B tape:
swenson@localhost:~/ex-its-2/fdtm10a$ itstar tvf ../fdtm10a.tape
Tape 0, reel 0, created 08/04/19, type=full
-PICS-;M.F.D. (FILE) 2019-7-4
;190804 !
-PICS-;.FILE. (DIR) 2019-7-4
-PICS-;-READ- -THIS- 2019-6-14
...
ZORK;TYPHAK 16 => zork/typhak.16 [OK]
ZORK;UTIL 16 => zork/util.16 [OK]
ZZ;.FILE. (DIR) => zz/_file_.(dir) [OK]
ZZ;APROPO 31 => zz/apropo.31 [OK]
eswenson@localhost:~/ex-its-2/fdtm10b$
In other words, the TM10B tape had no issues extracting all the files. But the extracting from the TM10A tape failed miserably:
eswenson@localhost:~/ex-its-2/fdtm10b$ cd ../fdtm10a
eswenson@localhost:~/ex-its-2/fdtm10a$ ls
-pics-
eswenson@localhost:~/ex-its-2/fdtm10a$
eswenson@localhost:~/ex-its-2/fdtm10a$ find .
.
./-pics-
./-pics-/m_f_d_.(file)
eswenson@localhost:~/ex-its-2/fdtm10a$
Note, also, that itstar didn't include the:
Tape 0, reel 0, created 08/04/19, type=full
header when it attempted the TM10A tape extracting.
From this, I would conclude that there is an issue with TM10A generation in pdp10-ka when ITS is configured to use TM10A. It works fine with TM10B.
I've uploaded gzip'ed copies of both tapes here:
- https://s3.amazonaws.com/eswenson-its/public/fdtm10a.tape.gz
- https://s3.amazonaws.com/eswenson-its/public/fdtm10b.tape.gz
Hmm....I increased the timeout from 420 to 600 in PDP10/kx10_mt.c, and used that emulator to boot my EX system (used TM10A version of ITS, of course and told emulator that drive was type a. I then completed a full dump of the system and extracted it on my host. I didn't have obvious errors during the extraction. However, I compared the TM10A extraction with the TM10B extraction and here is the diff:
eswenson@localhost:~/ex-its-2$ diff -qr fdtm10a fdtm10b 2>&1 | grep -v "No such file" | grep -v "_file_"
Only in fdtm10b/c: testc.c
Only in fdtm10b/c: testc.stinkr
Files fdtm10a/channa/logout.times and fdtm10b/channa/logout.times differ
Only in fdtm10a/dragon: cdata.30
Only in fdtm10a/dragon: cdata.31
Files fdtm10a/dragon/dragon.hoard and fdtm10b/dragon/dragon.hoard differ
Files fdtm10a/dragon/dragon.save and fdtm10b/dragon/dragon.save differ
Files fdtm10a/dragon/dragon.yester and fdtm10b/dragon/dragon.yester differ
Files fdtm10a/draw/dips.dip and fdtm10b/draw/dips.dip differ
Only in fdtm10b/draw: disp.500
Files fdtm10a/fonts/fonts.summry and fdtm10b/fonts/fonts.summry differ
Only in fdtm10b/fonts: fonts.wid
Only in fdtm10b: junk
Files fdtm10a/_mail_/admin.mail and fdtm10b/_mail_/admin.mail differ
Only in fdtm10b/_mail_: ~~~~id.86
Only in fdtm10a/_mail_: ~~~~id.87
Files fdtm10a/_mail_/~lock.unique and fdtm10b/_mail_/~lock.unique differ
Files fdtm10a/_mail_/stats.1 and fdtm10b/_mail_/stats.1 differ
Files fdtm10a/mits_s/channl.4 and fdtm10b/mits_s/channl.4 differ
Only in fdtm10b/mits_s: chsncp.49
Only in fdtm10b: _msgs_
Files fdtm10a/paulw/newfac.73 and fdtm10b/paulw/newfac.73 differ
Only in fdtm10b/paulw: newinv.89
Only in fdtm10b/paulw: residu.105
Files fdtm10a/-pics-/m_f_d_.(file) and fdtm10b/-pics-/m_f_d_.(file) differ
Files fdtm10a/rms/macros.18 and fdtm10b/rms/macros.18 differ
Only in fdtm10b/rms: palx.143
Files fdtm10a/sys/ts.micro and fdtm10b/sys/ts.micro differ
Only in fdtm10b/sys: ts.midas
Files fdtm10a/sys/ts.peek and fdtm10b/sys/ts.peek differ
Files fdtm10a/sysbin/name.bin and fdtm10b/sysbin/name.bin differ
Files fdtm10a/syseng/macro.tapes and fdtm10b/syseng/macro.tapes differ
Only in fdtm10a/system: config.213
Files fdtm10a/_tape0/tape.0 and fdtm10b/_tape0/tape.0 differ
Files fdtm10a/transl/trans1.autolo and fdtm10b/transl/trans1.autolo differ
Only in fdtm10b/transl: trans2.39
eswenson@localhost:~/ex-its-2$
The missing files from the fdtm10a extract are troublesome. Most of the differences are expected, however, these are not:
Files fdtm10a/draw/dips.dip and fdtm10b/draw/dips.dip differ
Files fdtm10a/fonts/fonts.summry and fdtm10b/fonts/fonts.summry differ
Files fdtm10a/mits_s/channl.4 and fdtm10b/mits_s/channl.4 differ
Files fdtm10a/paulw/newfac.73 and fdtm10b/paulw/newfac.73 differ
Files fdtm10a/rms/macros.18 and fdtm10b/rms/macros.18 differ
Files fdtm10a/sys/ts.micro and fdtm10b/sys/ts.micro differ
Files fdtm10a/transl/trans1.autolo and fdtm10b/transl/trans1.autolo differ
I compared fdtm10a/paulw/newfac.73 and fdtm10b/paulw/newfac.73 differ and the tm10a version has extra stuff at the end -- some garbage and the start of another file.
The same thing is true about fdtm10a/rms/macros.18 and fdtm10b/rms/macros.18.
The same thing is true about fdtm10a/transl/trans1.autolo and fdtm10b/transl/trans1.autolo.
I'm sure if I inspected the different binary files, I'd find the same thing -- garbage at the end of the file.
@rcornwell, this is the issue where Eric reported a TM10A problem.... or ITS problem if you prefer. We don't know where the problem is exactly.
I believe you have said it's ITS turning off interrupts too long, and maybe that is indeed the case. I'd like to have that confirmed, and if so, what do we do about it? Presumably AI and ML used TM10A reliably for 10 years or so.
I'm doing another experiment now. On EX, which is currently configured as with a TM10B and where pdp10-ka is configured with the appropriate type and mpx settings, I'm dong a full dump. I'm going to check (again) the veracity of this tape. Then I'll do the experiment again with TM10A and the appropriate type/mpx settings. I just want to make sure that when I had TM10A errors before, it wasn't because of missing/incorrect mpx setting. (I know the type was correct -- nothing is read when that is wrong). In my case, I just got several bad (incorrectly written) files on the tape. I'll report on the results -- it takes an awfully long time to do a full dump under pdp10-ka.
I just had a thought. The ITS build script has DUMP do an ICHECK. Wouldn't this mean the tape files are read back and compared against the files on disk? How could this possibly pass if the tape files are corrupted?
Right. I was wondering the same thing. I wonder if ICHECK checks the contents of the files or just the size/date modified of the files?
From the looks of the code ICHECK does seem to compare file contents. There a loop at ICKLP which compares BUF against BIF, and the latter has the comment ;COMPARE BUFFER.
I would set a debug log to a file. Do: set mta debug=cono;dataio;detail;exp set cpu debug=cono;irq
This should point you to the location that is holding the irq off.
I'm not sure that ICHECK is the correct command to call -- that is for incremental dumps.
So far, I've made a full dump using a TM10B system and verified that it is good by running DUMP/CHECK on that system a couple times. As time progresses, there are more and more differences reported by CHECK -- all correct, since various files do change.
Now, I've booted the same system configured as TM10A and am running another DUMP/CHECK. I should see a few differences -- including the changes to CONFIG > and my ITS BIN -- here, but otherwise it should complete with only expected differences.
Once that test is complete, I will reverse the process. I will do a full dump on the TM10A-configured system, and do the DUMP/CHECK there a couple times. Then I'll switch back to the TM10B system and do the DUMP/CHECK there.
I'll report on the results.
So, when I use a TM10B system and do the CHECK, everything is as it should be. When I use a TM10A system and do a CHECK of a tape generated on a TM10B system, everything is as it should be.
However, when I create a full dump on a TM10A system, and do a check on a TM10A system, I see errors. For example:
_check
LIST DEV =tty:TAPE NUMBER ON TAPE IS 0, REAL TAPE NUMBER/NAME=0
TAPE NO 0 CREATION DATE 190816
REEL NO 0 OF FULL DUMP
1231 BAD WORDS AT 1016
245 .TAPE0 TAPE 0 0 8/16/2019 11:32:15.5 NEW 751 WORDS !
273 LONGER ON DISK ! 1 BAD WORD AT 8 37 BAD WORDS AT 48 44 !
BAD WORDS AT 48 1317 BAD WORDS AT 5112 1231 BAD WORDS AT 16376
DATA ON TAPE TOO SHORT
3184 MRC TEN50 22 LENGTH ERROR TAPE IN ..FILE BEING SKIPPED..
3184 MRC TEN50 22 1 7/27/2019 18:07:20 46080 WORDS !
8 BAD WORDS AT 46072 1 *TAPE IOC * ! 2 BAD WORDS AT 4 E-O-T
REEL = 0 HAS 22039750 WORDS
_
And note that both my TM10A and TM10B system specify the matching "set mta type=" value for the ITS that is running and the same "set mta mpx=7" value.
@rcornwell Please confirm that "set mta mpx=7" is correct, regardless of the setting of "set mta type=a" or "set mta type=b".
This was a long time ago. Is it still a problem? KL ITS is configured with TM10AP==1 and the build script does a CHECK of the full dump, which seems to work fine.
I just did a full dump on my KL instance (pdp10-kl) and was able to use ITSTAR to extract the contents on my host. I then mounted the tape on my KS (pdp10-ks) instance, and was able to do at DUMP/LIST with no errors. So I suspect this is no longer a problem.
I have my pdp10-kl instance configured with:
set mta enabled type=b
and ITS configured with:
DEFOPT TM10B==1 ;DF10-BASED TAPE CONTROLLER
All works fine.