Quantum 6-00360-15 Portable Media Storage User Manual


 
Appendix D Quality of Service Guide
Callbacks
StorNext 3.1.3 Installation Guide 150
FSM Failures 4
If the FSM crashes or is stopped, there is no immediate affect on real-time
(ungated) I/O. As long as the I/O does not need to contact the FSM for
some reason (attribute update, extent request, etc.), the I/O will continue.
From the standpoint of QOS, the FSM being unavailable has no affect.
Non-real-time I/O will be pended until the FSM is re-connected. The
rationale for this is that since the stripe group is in real-time mode, there
is no way to know if the parameters have changed while the FSM is
disconnected. The conservative design approach was taken to hold off all
non-real-time I/O until the FSM is reconnected.
Once the client reconnects to the FSM, the client must re-request any real-
time I/O it had previously requested. The FSM does not keep track of
QOS parameters across crashes; that is, the information is not logged and
is not persistent. Therefore, it is up to the clients to inform the FSM of the
amount of required
rtio and to put the FSM back into the same state as it
was before the failure.
In most cases, this results in the amount of real-time and non-real-time I/
O being exactly the same as it was before the crash. The only time this
would be different is if the stripe group is oversubscribed. In this case,
since more
rtio had been requested than was actually available, and the
FSM had adjusted the request amounts, it is not deterministically possible
to re-create the picture exactly as it was before. Therefore, if a
deterministic picture is required across reboots, it is advisable to not over-
subscribe the amount of real-time I/O.
The process of each client re-requesting rtio is exactly the same as it was
initially; once each client has reestablished its
rtio parameters, the non-
real-time I/O is allowed to proceed to request a non-real-time token. It
may take several seconds for the SAN to settle back to its previous state.
It may be necessary to adjust the
RtTokenTimeout parameter on the FSM to
account for clients that are slow in reconnecting to the FSM.
Client Failures 4
When a client disconnects either abruptly (via a crash or a network
partition,) or in a controlled manner (via an unmount), the FSM releases
the client's resources back to the SAN. If the client had real-time I/O on
the stripe group, that amount of real-time I/O is released back to the
system. This causes a series of callbacks to the clients (all clients if the
stripe group is transitioning from real-time to non-real-time,) informing
them of the new amount of non-real-time I/O available.
If the client had a non-real-time I/O token, the token is released and the
amount of non-real-time I/O available is recalculated. Callbacks are sent