Design Notes for Fixing VAX Unaligned Access to IO and Register Space
Problem Statement: VAX unaligned accesses are handled by reading the
surrounding longword (or longwords) and
a) for reads, extracting the addressed addressed word or longword
b) for writes, inserting the addressed word or longword and then
writing the surrounding longword (or longwords) back
This is correct for all memory cases. On the 11/780, the unaligned
access to register or IO space causes an error, as it should. On
CVAX, it causes incorrect behavior, by either performing too many
QBus references, or performing read-modify-writes instead of pure
writes, or accessing the wrong Qbus locations.
The problem cannot be trivially solved with address manipulation.
The core issues is that on CVAX, unaligned access is done to
exactly as many bytes as are required, using a base longword
address and a byte mask. There are five cases, corresponding to
word and longword lengths, and byte offsets 1, 2 (longword only),
and 3. Further, behavior is different for reads and writes, because
the Qbus always performs word operations on reads, leaving it to
the processor to extract a byte if needed.
Conceptual design: Changes in vax_mmu.c:
Unaligned access is done with two separate physical addresses, pa
and pa1, because if the access crosses a page boundary, pa1 may
not be contiguous with pa. It's worth noting that in an unaligned
access, the low part of the data begins at pa (complete with byte
offset), but the high parts begins at pa1 & ~03 (always in the
low-order end of the second longword).
To handle unaligned data, we will add two routines for read and
write unaligned:
data = ReadU (pa, len);
WriteU (pa, len, val);
Note that the length can be 1, 2, or 3 bytes. For ReadU, data is
return right-aligned and masked. For WriteU, val is expected to
be right-aligned and masked.
The read-unaligned flows are changed as follows:
if (mapen && ((off + lnt) > VA_PAGSIZE)) { /* cross page? */
vpn = VA_GETVPN (va + lnt); /* vpn 2nd page */
tbi = VA_GETTBI (vpn);
xpte = (va & VA_S0)? stlb[tbi]: ptlb[tbi]; /* access tlb */
if (((xpte.pte & acc) == 0) || (xpte.tag != vpn) ||
((acc & TLB_WACC) && ((xpte.pte & TLB_M) == 0)))
xpte = fill (va + lnt, lnt, acc, NULL); /* fill if needed */
pa1 = ((xpte.pte & TLB_PFN) | VA_GETOFF (va + 4)) & ~03;
}
else pa1 = ((pa + 4) & PAMASK) & ~03; /* not cross page */
bo = pa & 3;
if (lnt >= L_LONG) { /* lw unaligned? */
sc = bo << 3;
wl = ReadU (pa, L_LONG - bo); /* read both fragments */
wh = ReadU (pa1, bo); /* extract */
return ((wl | (wh << (32 - sc))) & LMASK);
}
else if (bo == 1) /* read within lw */
return ReadU (pa, L_WORD);
else {
wl = ReadU (pa, L_BYTE); /* word cross lw */
wh = ReadU (pa1, L_BYTE); /* read, extract */
return (wl | (wh << 8));
}
These are not very different, but they do reflect that ReadU returns
right-aligned and properly masked data, rather than the encapsulating
longword.
The write-unaligned flows change rather more drastically:
if (mapen && ((off + lnt) > VA_PAGSIZE)) {
vpn = VA_GETVPN (va + 4);
tbi = VA_GETTBI (vpn);
xpte = (va & VA_S0)? stlb[tbi]: ptlb[tbi]; /* access tlb */
if (((xpte.pte & acc) == 0) || (xpte.tag != vpn) ||
((xpte.pte & TLB_M) == 0))
xpte = fill (va + lnt, lnt, acc, NULL);
pa1 = ((xpte.pte & TLB_PFN) | VA_GETOFF (va + 4)) & ~03;
}
else pa1 = ((pa + 4) & PAMASK) & ~03;
bo = pa & 3;
if (lnt >= L_LONG) {
sc = bo << 3;
WriteU (pa, L_LONG - bo, val & insert[L_LONG - bo]);
WriteU (pa, bo, (val >> (32 - sc)) & insert[bo]);
}
else if (bo == 1) /* read within lw */
WriteU (pa, L_WORD, val & WMASK);
else { /* word cross lw */
WriteU (pa, L_BYTE, val & BMASK);
WriteU (pa, L_BYTE, (val >> 8) & BMASK);
}
return;
}
Note that all the burden here has been thrown on the WriteU routine.
-------------
ReadU is the simpler of the two routines that needs to be written.
It will handle memory reads and defer register and IO space to
model-specific unaligned handlers.
int32 ReadU (uint32 pa, int32 lnt)
{
int32 dat;
int32 sc = (pa & 3) << 3;
if (ADDR_IS_MEM (pa))
dat = M[pa >> 2];
else {
mchk = REF_V;
if (ADDR_IS_IO (pa))
dat = ReadIOU (pa, lnt);
else dat = ReadRegU (pa, lnt);
}
return ((dat >> sc) & insert[lnt]);
}
Note that the ReadIOU and ReadRegU return a "full longword," just
like their aligned counterparts, and ReadU right-aligns the result,
just as ReadB, ReadW, and ReadL do.
WriteU must handle the memory read-modify-write sequence. However,
it defers register and IO space to model-specific unaligned handlers.
void WriteU (uint32 pa, int32 lnt, int32 val)
{
if (ADDR_IS_MEM (pa)) {
int32 bo = pa & 3;
int32 sc = bo << 3;
M[pa >> 2] = (M[pa >> 2] & ~(insert[len] << sc) | (val << sc);
}
else if ADDR_IS_IO (pa)
WriteIOU (pa, lnt, val);
else WriteRegU (pa, lnt, val);
return;
}
--------------
For the 11/780, ReadIOU, ReadRegU, WriteIOU, and WriteRegU all do the
same thing: they throw an SBI machine check. We can write explicit
routines to do this (and remove the unaligned checks from all the
normal adapter flows), or leave things as they are and simply define
the four routines as macros that go to the normal routines. So there's
very little to do.
On CVAX, I suspect that ReadRegU and WriteRegU behave like the
normal routines. The CVAX specs don't say much, but CMCTL (the memory
controller) notes that it ignores the byte mask and treats every
access as an aligned longword access. I suspect this is true for
the other CVAX support chips, but I no longer have chip specs.
The Qbus, on the other hand... that's a fun one. Note that all of
these cases are presented to the existing aligned IO routine:
bo = 0, byte, word, or longword length
bo = 2, word
bo = 1, 2, 3, byte length
All the other cases are going to end up at ReadIOU and WriteIOU,
and they must turn the request into the exactly correct number of
Qbus accesses AND NO MORE, because Qbus reads can have side-effects,
and word read-modify-write is NOT the same as a byte write.
The read cases are:
bo = 0, byte or word - read one word
bo = 1, byte - read one word
bo = 2, byte or word - read one word
bo = 3, byte - read one word
bo = 0, triword - read two words
bo = 1, word or triword - read two words
ReadIOU is very similar to the existing ReadIO:
int32 ReadIOU (uint32 pa, int32 lnt)
{
int32 iod;
iod = ReadQb (pa); /* wd from Qbus */
if ((lnt + (pa & 1)) <= 2) /* byte or word & even */
iod = iod << ((pa & 2)? 16: 0); /* one op */
else iod = (ReadQb (pa + 2) << 16) | iod; /* two ops, get 2nd wd */
SET_IRQL;
return iod;
}
The write cases are:
bo = x, lnt = byte - write one byte
bo = 0 or 2, lnt = word - write one word
bo = 1, lnt = word - write two bytes
bo = 0, lnt = triword - write word, byte
bo = 1, lnt = triword - write byte, word
WriteIOU is similar to the existing WriteIO:
void WriteIO (uint32 pa, int32 val, int32 lnt)
{
switch (lnt) {
case L_BYTE: /* byte */
WriteQb (pa, val & BMASK, WRITEB);
break;
case L_WORD: /* word */
if (pa & 1) { /* odd addr? */
WriteQb (pa, val & BMASK, WRITEB);
WriteQb (pa + 1, (val >> 8) & BMASK, WRITEB);
}
else WriteQb (pa, val, WRITE);
break;
case 3: /* triword */
if (pa & 1) { /* odd addr? */
WriteQb (pa, val & BMASK, WRITEB);
WriteQb (pa + 1, (val >> 8) & WMASK, WRITE);
}
else {
WriteQb (pa, val & WMASK, WRITE);
WriteQb (pa + 2, (val >> 16) & BMASK, WRITEB);
}
break;
}
SET_IRQL;
return;
}
-----------------
I think this handles all the cases.
/Bob Supnik
Conflicts:
VAX/vax780_defs.h
VAX/vax_mmu.c
VAX/vaxmod_defs.h
John Dundas said:
Bob and all,
I ran across what I believe to be a bug in the CSM code:
case 070: /* CSM */
if (CPUT (HAS_CSM) && (MMR3 & MMR3_CSM) || (cm != MD_KER)) {
According to the Architecture Handbook, CSM may be executed only if the MMR3 bit is set AND the mode is not Kernel. Changing the code to:
case 070: /* CSM */
if (CPUT (HAS_CSM) && (MMR3 & MMR3_CSM) && (cm != MD_KER)) {
also has the effect of making the ZKDKB0 diagnostic much happier.
Thanks,
John
--
John A. Dundas III
This is the results of external KDP development activities. The KDP side done by Timothe Litt and DMC and DUP by Mark Pizzolato
Additionally, other PDP10 updates from Timothe Litt
This summer a group of us worked together to resurrect the original ARPAnet IMP software, and I’m now happy to say that the IMP lives again in simulation. It’s possible to run the original IMP software on a modified version of the H316 simh and to set up a virtual network of simulated IMPs talking to each other. IMP to IMP connections, which would have originally been carried over leased telephone lines, are tunneled over IP. As far as we can tell, everything works pretty much as it did in the early 1970s. IMPs are able to exchange routing information, console to console communications, network statistics, and they would carry host traffic if there were hosts on the network. The hooks are in there to allow simh to support the IMP side of the 1822 host interface, and the next step would be to recover the OS for an ARPAnet era host and then extend the corresponding simulator to talk to the IMP simulation.
From Timothe Litt:
- Detect vector conflicts, SHOW IOSPACE
- Detect conflicting vector assignments.
- Correct show iospace display of high vector for multi-unit devices
- Display vectors in DEV_RDX in show iospace
- Ignore disabled devices when searching for conflict.
- Improve device conflict reporting, Report both devices when address conflict is detected.
From Mark Pizzolato:
- Added optional alternate radix output display for device address and vector values
displayed by SHOW DEVICE.
-H and -O switches select hex or octal output in addition to the default radix displayed
by the normal simulator.
- Added halfduplex mode for network connections and corrected modem signal DSR to reflect connection status (vs original attach status), and DCD follows DSR (except in halfduplex mode where it follows CTS).
- Enhance tmxr_set_get_modem_bits to also return the modem DTR and RTS state.
- Separate RTS from DTR when manipulating modem state bits
- Minor fixes to loopback functionality after direct testing with the first loopback client device (DMC11).
- Fix clearing of break input buffer now that input buffers are dynamically allocated
- Changed Modem bit logic to have CTS reflect RTS as expected by devices which may expect this.
- Changed receive buffers to be dynamically allocated and the same size as transmit buffers when transmit buffers are non-default sized.
- Added TMXR line attach in loopback mode. Fixed loopback buffer management
- Added loopback support to TMXR lines
- Added functioning connect poll capability to revised DMC
- Added connection destination display to connection status even when a connection has not yet been established.
- Added extended packet sending and receiving semantics to TMXR allowing for an optional frame byte to exist between length prefixed data packets
- Avoid assignments of void * values. Cast all memory allocation return values to appropriate types.
- Add output to sim_log where missing in various places.
- Fixed issue with lost file positions after a restore for devices which leverage the UNIT_SEQ flag.
This required a simulator specific implementation since the PDP11 PC register isn't stored in a normal memory location. It is loaded from a temporary location upon simulator instruction execution startup (and saved to that location when instuction execution stops). In order to reference the PC value while executing instructions (for debug output), this extended access model is required.
Separate boot ROMs are available for each of the DEQNA, DELQA and DELQA-T devices being simulated.
DEQNA-Lock mode has been added to the DELQA and DELQA-T simulations.
When the conversational boot is waiting for input it uses a BLBC instruction to read the DUART status register and test the RXR bit. This generates an unaligned longword read in Qbus I/O space (Yuk!). I realised that the code to read unaligned data in vax_mmu is broken. It attempts to read the aligned longwords that the data spans, but there was no code to convert the unaligned pa passed in to it's aligned form, thus the wrong bytes are returned and the BLBC never sees the RXR bit.
The vertical sync interrupt generated in vc_svc() was too slow for Ultrix. The driver enables interrupts and then calls DELAY(20000), which I guess is 20ms. This translates to about 10000 simulated instructions, but tmxr_poll works out at ~75000 instructions. I've added a new unit to simulate just the vsync interrupts activating every 8000 instructions.
This approach will completely disable the ability to idle while the QVSS device is enabled.