It looks like disk controllers, which automatically update
disk address (DA) after completion of I/O, are expected to do
so even if there was no data transfer because of I/O errors.
I was studying RSX-11's Error Logger documentation and
examples are clearly offsetting disk addresses backwards
by one when I/O errors are reported by the controller.
Since once the controller has found the DA-specified sector,
the I/O begins regardless of the condition of the sector (bad
or good data) or ability to transfer the contents between the
disk and the memory. If an error occurs (NXM, for instance)
the operation would stop (with the error reported) at the end
of the sector. So if, for example, the bus address register
had a bad address from the get-go, no data would be able to
transfer at all, yet DA should still be updated with DA + 1
once the controller asserts the DONE bit.
This patch makes sure that DA is always advanced when I/O has
actually been commenced.
Running earlier XXDP tests revealed that a technique of concurrent command
initiation and continued housekeeping for the command completion was used in
the old code.
For example, code could initiate a SEEK command for a drive, and knowing that
CS_DONE (and thus, an interrupt) is coming in about 16us, it would then go
ahead and clear a flag, which registers that the interrupt has occurred
(expected to be set to 1 by the ISR). If CS_DONE is set by the implementation
at the function initiation immediately, that would mean that the interrupt
could be triggered before the next instruction, and the flag would be set by
the ISR right away. The main code, however, would proceed with the the flag
clear as the following instruction, thus, never detecting the interrupt down
the road.
Since this technique was in existence, it is better to introduce a delay for
setting CS_DONE in the "fast" initiation commands like SEEK and HOME, to
accommodate the software that was relying on it.
So far, however, no issues were encountered in testing (except one), where
this delay mattered, but it's hard to tell if it would not be needed at all.
All I/O commands always delay CS_DONE already because they were never supposed
to be immediate.
Since the time for CS_DONE in initiation commands was documented at 16us, the
introduced delay is set to 10 instructions, which usually took more than that
to execute. But the interrupt flag clear case would be covered, as well as
the counted waits, which used some 25+-iteration tight loops for "drive ready",
before flagging a time-out (so the delay cannot be longer, either).
It also looks like more modern code never used any such tricks, so for it, it
should not matter if CS_DONE was slightly delayed or not.
Having run the device code thru XXDP and some other OS's and scenarios
rigorously, a bunch of discrepancies were found, which need to be addressed
by this rather extensive patch.
1. Each unit must implement its own "drive status" register, to be able to
track per-drive errors / conditions correctly;
2. Fixed INT_SET() / INT_CLR() in RPCS write function (wrong order of the "if"
conditions);
3. Some behavior was implemented not exactly how it was expected from the real
hardware, such as:
a. Post-I/O register values in RPDA and RPCA (including the corner case of
pack overflow);
b. I/O stacking, which wasn't mentioned in any available documentation, but
only XXDP listings;
c. RESET/IDLE function must be accepted for a "busy" controller;
d. HOME function must always execute, even when "device ready" is not set
(e.g. when SEEK error detected);
e. SEEK incomplete should not respond with "device ready" (however, the
condition can be cleared by HOME, d.);
f. WLOA-induced write-lock violation wasn't reflected in "device status".
4. Some timing was off so that the device worked "too fast" -- this was fixed
(except for the pathological cases when the races are in the actual test
code, and cannot be logically fixed);
5. WLOA setup command bug was fixed;
6. Added more code comments found per the above peculiarities.