its icon indicating copy to clipboard operation
its copied to clipboard

DUMP reports: DATA ON TAPE TOO SHORT

Open eswenson1 opened this issue 6 years ago • 25 comments

I created an ITS dump tape -- it was a full dump of an ITS system.

On another ITS system, I attempted a restore of the tape. I got the following error:

DATA ON TAPE TOO SHORT 3111 MITS.S CONFIG 850 LENGTH ERROR TAPE IN ..FILE BEING SKIPPED..

eswenson1 avatar Jul 26 '19 21:07 eswenson1

Further along on the tape I get:

DATA ON TAPE TOO SHORT 4136 SYSBIN MIDAS BIN LENGTH ERROR TAPE IN ..FILE BEING SKIPPED..

and

DATA ON TAPE TOO SHORT 4144 SYSBIN PROBE BIN LENGTH ERROR TAPE IN ..FILE BEING SKIPPED..

eswenson1 avatar Jul 26 '19 21:07 eswenson1

I have never seen this. But then, I have never loaded a tape from the full dump.

An easy check would be to see what itstar says about those files.

A more difficult check would be to change the tape controller. I.e. if you have TM10B now, use TM10A instead, or vice versa.

larsbrinkhoff avatar Jul 27 '19 15:07 larsbrinkhoff

I ran "itstar tvf" on the fulldump.tape that I had created and it reported no errors. I did, specifically look at the output around the tapes that ITS said where the data was too short. However, when I did an "itstar xvf" to extract all the files from the tape, I noticed that the extract was far short of all the files that I had dumped. In fact, none of the files where ITS reported the "DATA ON TAPE TOO SHORT" were extracted (perhaps indicating that the tape load aborted, with no errors reported).

As an example, "itstar tvf" included:

 ...
MITS.S;.FILE. (DIR)   2019-6-26
MITS.S;-READ- -THIS-   2019-6-14
MITS.S;3COM 3   2019-6-14
MITS.S;BOOT11 1   2019-6-14
MITS.S;CAMAC 1   2019-6-14
MITS.S;CH11 14   2019-6-14
MITS.S;CHANNL 4   2019-6-14
MITS.S;CHSNCP 49   2019-6-14
MITS.S;COMPRO 96   2019-6-14
 ...

And when I did "itstar xvf", the "mits.s" directory was not extracted. Of course, lots of other directories were not extracted either.

eswenson1 avatar Jul 29 '19 16:07 eswenson1

I even tried the itstar from lars/skipfile and it has the same issues.

eswenson1 avatar Jul 29 '19 16:07 eswenson1

Since DUMP on ITS and itstar on host appear to have issues with this full dump tape, and since I created this tape on a TM10A io bus tape controller, and since I don't believe we've used TM10A except in the not-much-used KL build (sorry, rcornwell, this was Lars' name/choice and not mine), I wonder if there is an issue with pdp10-ka's TM10A support, or in ITS' support for that tape controller? (I suspect the latter is not true, since MC used this tape controller, I think).

I will perform two tests:

  • I'll recreate the same tape (full dump of KL) using TM10A controller and re-test to see if the same thing occurs, or if the same thing occurs, but in different places.
  • I'll create an EX ITS with TM10B controller and see whether that has issues (or not).

eswenson1 avatar Jul 29 '19 17:07 eswenson1

I'm glad itstar wasn't able to extract all files, because that narrows down the problem. But I'm surprised that the file listing was OK. Did extraction skip entire directories, or what?

The "DATA ON TAPE TOO SHORT" message is just in one place in DUMP: below the MAGRDN label. The complaint seems to be that the tape file header says the file is longer than the data on tape.

itstar doesn't care about the file length; it doesn't write it to the file header, and ignores it when reading tapes. This doesn't explain why directories are skipped.

larsbrinkhoff avatar Jul 29 '19 19:07 larsbrinkhoff

I haven't completed the experiments yet. But I did notice something: I did another full dump of essentially the same file system (some changes, though). And then I used DUMP's LIST command to list the entire tape. Of course, this took forever, but I noticed that in the MITS.S and SYSBIN directores were DUMP OUTPUT. This suggests that in those directories, DUMP didn't finish writing at least one file in those directories.

The file MITS.S; CONFIG 850 still has its "not backed up" flag on, and is 18 records long. The DUMP OUTPUT file is 15 records long (and also "not backed up". My guess is that the behavior I saw earlier (on another ITS system) is going to be repeated with this new full dump -- and with the same files.

eswenson1 avatar Jul 29 '19 20:07 eswenson1

Interesting. This time, it would appear that in SYSBIN; it was PROBE BIN (same as before), MIDAS BIN (same as before) and TELSER BIN (new) that failed to get dumped properly by DUMP. These three files ended up with their "not dumped" flags on.

And when I tried to extract the files from the tape using itstar (master branch version), it didn't extract many, many directories (same as before), but this time gave this error: "?Tape record too short" and aborted. And when I used the itstar from lars/skipfile branch, it did not give this error and appears to have extracted more files/directories. Not sure if the files marked as not dumped on the source ITS system made it to the extracted host directories correctly. And there are still many directories that didn't get extracted. For example, the entire "sysbin;" directory didn't get extracted. (Although mits.s;CONFIG 850 did get extracted).

It looks like NONE of the directories after SYSBIN on the tape got extracted.

eswenson1 avatar Jul 29 '19 20:07 eswenson1

Thanks. It seems clear the major problem is on the dumping end, not extraction.

I tried itstar tvf on two recently built output.tapes, and I didn't see any error. I think they're both dumped with TM10B, so maybe this points to a TM10A problem.

larsbrinkhoff avatar Aug 04 '19 15:08 larsbrinkhoff

Except that I think I created a second full dump from a TM10B system (I have two ITS BIN files for EX -- one TM10A and the other TM10B) and it had the same problems with the lars/skipfile version of itstar.

I'll do the whole experiment again (dumping on TM10A and TM10B systems) with the master branch itstar (should I use the master branch of the itstar repo or the master branch submodule from the its repo)?

eswenson1 avatar Aug 04 '19 17:08 eswenson1

Either should be fine, there are no changes to the way files are extracted. But I'd say use itstar's master.

@rcornwell, maybe the delay value we decided on was too small after all?

Eric, FYI there's a delay currectly at 420 near the end of mt_srv in the simulator file PDP10/kx10_mt.c. Rich and I tried to find the lowest value that didn't trigger any errors in the moby DUMP extract in the beginning of the ITS build.

larsbrinkhoff avatar Aug 04 '19 18:08 larsbrinkhoff

This is weird. I did a full dump with EX ITS, first with TM10A and second with TM10B. The full dump tapes were called fdtm10a.tape and fdtm10b.tape.

Here's an attempt with the master branch itstar (from the itstar repo) to extract the files from the TM10A tape:

eswenson@localhost:~/ex-its-2/fdtm10a$ itstar xvf ../fdtm10a.tape
-PICS-;M.F.D. (FILE) => -pics-/m_f_d_.(file) [OK]
;190804      ! => /190804.~~~~~! No such file or directory
eswenson@localhost:~/ex-its-2/fdtm10a$

And here's an attempt to extract the files from the TM10B tape:

swenson@localhost:~/ex-its-2/fdtm10a$ itstar tvf ../fdtm10a.tape
Tape 0, reel 0, created 08/04/19, type=full
-PICS-;M.F.D. (FILE)   2019-7-4
;190804      !
-PICS-;.FILE. (DIR)   2019-7-4
-PICS-;-READ- -THIS-   2019-6-14
...
ZORK;TYPHAK 16 => zork/typhak.16 [OK]
ZORK;UTIL 16 => zork/util.16 [OK]
ZZ;.FILE. (DIR) => zz/_file_.(dir) [OK]
ZZ;APROPO 31 => zz/apropo.31 [OK]
eswenson@localhost:~/ex-its-2/fdtm10b$

In other words, the TM10B tape had no issues extracting all the files. But the extracting from the TM10A tape failed miserably:

eswenson@localhost:~/ex-its-2/fdtm10b$ cd ../fdtm10a
eswenson@localhost:~/ex-its-2/fdtm10a$ ls
-pics-
eswenson@localhost:~/ex-its-2/fdtm10a$

eswenson@localhost:~/ex-its-2/fdtm10a$ find .
.
./-pics-
./-pics-/m_f_d_.(file)
eswenson@localhost:~/ex-its-2/fdtm10a$

Note, also, that itstar didn't include the:

Tape 0, reel 0, created 08/04/19, type=full

header when it attempted the TM10A tape extracting.

From this, I would conclude that there is an issue with TM10A generation in pdp10-ka when ITS is configured to use TM10A. It works fine with TM10B.

eswenson1 avatar Aug 04 '19 20:08 eswenson1

I've uploaded gzip'ed copies of both tapes here:

  • https://s3.amazonaws.com/eswenson-its/public/fdtm10a.tape.gz
  • https://s3.amazonaws.com/eswenson-its/public/fdtm10b.tape.gz

eswenson1 avatar Aug 04 '19 20:08 eswenson1

Hmm....I increased the timeout from 420 to 600 in PDP10/kx10_mt.c, and used that emulator to boot my EX system (used TM10A version of ITS, of course and told emulator that drive was type a. I then completed a full dump of the system and extracted it on my host. I didn't have obvious errors during the extraction. However, I compared the TM10A extraction with the TM10B extraction and here is the diff:

eswenson@localhost:~/ex-its-2$ diff -qr fdtm10a fdtm10b 2>&1 | grep -v "No such file" | grep -v "_file_"
Only in fdtm10b/c: testc.c
Only in fdtm10b/c: testc.stinkr
Files fdtm10a/channa/logout.times and fdtm10b/channa/logout.times differ
Only in fdtm10a/dragon: cdata.30
Only in fdtm10a/dragon: cdata.31
Files fdtm10a/dragon/dragon.hoard and fdtm10b/dragon/dragon.hoard differ
Files fdtm10a/dragon/dragon.save and fdtm10b/dragon/dragon.save differ
Files fdtm10a/dragon/dragon.yester and fdtm10b/dragon/dragon.yester differ
Files fdtm10a/draw/dips.dip and fdtm10b/draw/dips.dip differ
Only in fdtm10b/draw: disp.500
Files fdtm10a/fonts/fonts.summry and fdtm10b/fonts/fonts.summry differ
Only in fdtm10b/fonts: fonts.wid
Only in fdtm10b: junk
Files fdtm10a/_mail_/admin.mail and fdtm10b/_mail_/admin.mail differ
Only in fdtm10b/_mail_: ~~~~id.86
Only in fdtm10a/_mail_: ~~~~id.87
Files fdtm10a/_mail_/~lock.unique and fdtm10b/_mail_/~lock.unique differ
Files fdtm10a/_mail_/stats.1 and fdtm10b/_mail_/stats.1 differ
Files fdtm10a/mits_s/channl.4 and fdtm10b/mits_s/channl.4 differ
Only in fdtm10b/mits_s: chsncp.49
Only in fdtm10b: _msgs_
Files fdtm10a/paulw/newfac.73 and fdtm10b/paulw/newfac.73 differ
Only in fdtm10b/paulw: newinv.89
Only in fdtm10b/paulw: residu.105
Files fdtm10a/-pics-/m_f_d_.(file) and fdtm10b/-pics-/m_f_d_.(file) differ
Files fdtm10a/rms/macros.18 and fdtm10b/rms/macros.18 differ
Only in fdtm10b/rms: palx.143
Files fdtm10a/sys/ts.micro and fdtm10b/sys/ts.micro differ
Only in fdtm10b/sys: ts.midas
Files fdtm10a/sys/ts.peek and fdtm10b/sys/ts.peek differ
Files fdtm10a/sysbin/name.bin and fdtm10b/sysbin/name.bin differ
Files fdtm10a/syseng/macro.tapes and fdtm10b/syseng/macro.tapes differ
Only in fdtm10a/system: config.213
Files fdtm10a/_tape0/tape.0 and fdtm10b/_tape0/tape.0 differ
Files fdtm10a/transl/trans1.autolo and fdtm10b/transl/trans1.autolo differ
Only in fdtm10b/transl: trans2.39
eswenson@localhost:~/ex-its-2$

The missing files from the fdtm10a extract are troublesome. Most of the differences are expected, however, these are not:

Files fdtm10a/draw/dips.dip and fdtm10b/draw/dips.dip differ
Files fdtm10a/fonts/fonts.summry and fdtm10b/fonts/fonts.summry differ
Files fdtm10a/mits_s/channl.4 and fdtm10b/mits_s/channl.4 differ
Files fdtm10a/paulw/newfac.73 and fdtm10b/paulw/newfac.73 differ
Files fdtm10a/rms/macros.18 and fdtm10b/rms/macros.18 differ
Files fdtm10a/sys/ts.micro and fdtm10b/sys/ts.micro differ
Files fdtm10a/transl/trans1.autolo and fdtm10b/transl/trans1.autolo differ

eswenson1 avatar Aug 05 '19 20:08 eswenson1

I compared fdtm10a/paulw/newfac.73 and fdtm10b/paulw/newfac.73 differ and the tm10a version has extra stuff at the end -- some garbage and the start of another file.

The same thing is true about fdtm10a/rms/macros.18 and fdtm10b/rms/macros.18.

The same thing is true about fdtm10a/transl/trans1.autolo and fdtm10b/transl/trans1.autolo.

I'm sure if I inspected the different binary files, I'd find the same thing -- garbage at the end of the file.

eswenson1 avatar Aug 05 '19 20:08 eswenson1

@rcornwell, this is the issue where Eric reported a TM10A problem.... or ITS problem if you prefer. We don't know where the problem is exactly.

I believe you have said it's ITS turning off interrupts too long, and maybe that is indeed the case. I'd like to have that confirmed, and if so, what do we do about it? Presumably AI and ML used TM10A reliably for 10 years or so.

larsbrinkhoff avatar Aug 16 '19 05:08 larsbrinkhoff

I'm doing another experiment now. On EX, which is currently configured as with a TM10B and where pdp10-ka is configured with the appropriate type and mpx settings, I'm dong a full dump. I'm going to check (again) the veracity of this tape. Then I'll do the experiment again with TM10A and the appropriate type/mpx settings. I just want to make sure that when I had TM10A errors before, it wasn't because of missing/incorrect mpx setting. (I know the type was correct -- nothing is read when that is wrong). In my case, I just got several bad (incorrectly written) files on the tape. I'll report on the results -- it takes an awfully long time to do a full dump under pdp10-ka.

eswenson1 avatar Aug 16 '19 06:08 eswenson1

I just had a thought. The ITS build script has DUMP do an ICHECK. Wouldn't this mean the tape files are read back and compared against the files on disk? How could this possibly pass if the tape files are corrupted?

larsbrinkhoff avatar Aug 16 '19 06:08 larsbrinkhoff

Right. I was wondering the same thing. I wonder if ICHECK checks the contents of the files or just the size/date modified of the files?

eswenson1 avatar Aug 16 '19 06:08 eswenson1

From the looks of the code ICHECK does seem to compare file contents. There a loop at ICKLP which compares BUF against BIF, and the latter has the comment ;COMPARE BUFFER.

larsbrinkhoff avatar Aug 16 '19 06:08 larsbrinkhoff

I would set a debug log to a file. Do: set mta debug=cono;dataio;detail;exp set cpu debug=cono;irq

This should point you to the location that is holding the irq off.

rcornwell avatar Aug 16 '19 11:08 rcornwell

I'm not sure that ICHECK is the correct command to call -- that is for incremental dumps.

So far, I've made a full dump using a TM10B system and verified that it is good by running DUMP/CHECK on that system a couple times. As time progresses, there are more and more differences reported by CHECK -- all correct, since various files do change.

Now, I've booted the same system configured as TM10A and am running another DUMP/CHECK. I should see a few differences -- including the changes to CONFIG > and my ITS BIN -- here, but otherwise it should complete with only expected differences.

Once that test is complete, I will reverse the process. I will do a full dump on the TM10A-configured system, and do the DUMP/CHECK there a couple times. Then I'll switch back to the TM10B system and do the DUMP/CHECK there.

I'll report on the results.

eswenson1 avatar Aug 16 '19 17:08 eswenson1

So, when I use a TM10B system and do the CHECK, everything is as it should be. When I use a TM10A system and do a CHECK of a tape generated on a TM10B system, everything is as it should be.

However, when I create a full dump on a TM10A system, and do a check on a TM10A system, I see errors. For example:

_check
 LIST DEV =tty:TAPE NUMBER ON TAPE IS 0, REAL TAPE NUMBER/NAME=0
TAPE NO      0 CREATION DATE  190816
REEL NO      0 OF FULL DUMP
  1231 BAD WORDS AT  1016
   245 .TAPE0 TAPE   0          0      8/16/2019 11:32:15.5 NEW    751 WORDS   !
273 LONGER  ON DISK !      1 BAD WORD  AT     8    37 BAD WORDS AT    48    44 !
BAD WORDS AT    48  1317 BAD WORDS AT  5112  1231 BAD WORDS AT 16376
DATA ON TAPE TOO SHORT
  3184 MRC    TEN50  22    LENGTH ERROR TAPE IN ..FILE BEING SKIPPED..

  3184 MRC    TEN50  22         1      7/27/2019 18:07:20      46080 WORDS     !
8 BAD WORDS AT 46072     1 *TAPE IOC * !      2 BAD WORDS AT     4 E-O-T
 REEL =     0 HAS  22039750 WORDS
_

And note that both my TM10A and TM10B system specify the matching "set mta type=" value for the ITS that is running and the same "set mta mpx=7" value.

@rcornwell Please confirm that "set mta mpx=7" is correct, regardless of the setting of "set mta type=a" or "set mta type=b".

eswenson1 avatar Aug 16 '19 19:08 eswenson1

This was a long time ago. Is it still a problem? KL ITS is configured with TM10AP==1 and the build script does a CHECK of the full dump, which seems to work fine.

larsbrinkhoff avatar Oct 17 '23 09:10 larsbrinkhoff

I just did a full dump on my KL instance (pdp10-kl) and was able to use ITSTAR to extract the contents on my host. I then mounted the tape on my KS (pdp10-ks) instance, and was able to do at DUMP/LIST with no errors. So I suspect this is no longer a problem.

I have my pdp10-kl instance configured with:

set mta enabled type=b

and ITS configured with:

DEFOPT TM10B==1         ;DF10-BASED TAPE CONTROLLER

All works fine.

eswenson1 avatar Oct 18 '23 17:10 eswenson1