In the previous post, i provide the steps to detect and clear faults in ALOM environment. In this post, the steps are just slightly different which is in the OS level environment.
1. The fmadm faulty command is used to display any faulty components in the system.
# fmadm faulty --------------- ------------------------------------ -------------- --------- TIME EVENT-ID MSG-ID SEVERITY --------------- ------------------------------------ -------------- --------- Sept 12 13:23:32 c49f99s3-1234-s4t6-8i76-w43b6732t6k PCIEX-8000-3S Critical Fault class : fault.io.pciex.device-interr max 40% fault.io.pciex.bus-linkerr 20% Affects : dev:////pci@400/pci@0/pci@8/scsi@0 dev:////pci@400/pci@0/pci@8 faulted but still in service FRU : "MB" (hc://:product-id=SUNW, T5240:chassis-id=ABC123456:server-id=ITsiti:serial= 0328MSL-09309L005K:part=540794001/motherboard=0) faulty Description : A problem has been detected on one of the specified devices or on one of the specified connecting buses. Refer to http://sun.com/msg/PCIEX-8000-3S for more information. Response : One or more device instances may be disabled Impact : Loss of services provided by the device instances associated with this fault Action : If a plug-in card is involved check for badly-seated cards or bent pins. Otherwise schedule a repair procedure to replace the affected device(s). Use fmadm faulty to identify the devices or contact Sun for support.
2. Once Fault Management has faulted a component in your system, you will want to repair it. The fmadm repair command is used to explicitly mark a fault as repaired. It accepts a UUID, FMRI, or Location as an argument.
# fmadm repair c49f99s3-1234-s4t6-8i76-w43b6732t6k fmadm: recorded repair to c49f99s3-1234-s4t6-8i76-w43b6732t6k