Oracle SCSI & RAID Devices Driver Download For Windows 10

  
  1. The SCSI-2 reservation ioctl used is part of the SCSI-2 command set. This is commonly implemented in most modern disk firmware. However, the reservation call is neither persistent, or capable of surviving reboots, nor able to cope with multiple paths to the same device.
  2. To determine the SCSI bus name-instance for each tape device, you must install the Oracle Secure Backup Solaris device driver, and then view the devices that were recognized by the driver and associate them with the tape device information gathered in the first step.

SCSI is the most widely used disk interface method going. There are only a couple of basic parameters that you or your SA can use to affect a change in the performance of the.

I’ve been recently involved in a project to migrate from old and rusty Cisco MDS 9222i to the new MDS 9506 SAN switches and during the first phase of the migration the primary node in a two-node Oracle RAC cluster lost access to its voting disks and went down. And that’s when only half paths to SAN storage was unreachable whilst the other half was absolutely ok and active.

Oracle scsi & raid devices driver download for windows 10 xp

Oracle support pointed out to the following errors:

WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.

Metalink document 1581684.1 at support.oracle.com gives more thorough explanation:

RAID

Generally this kind messages comes in ASM alertlog file on below situations:

Oracle scsi interfaceFor
  • Too many delayed ASM PST heart beats on ASM disks in normal or high redundancy diskgroup,
    thus the ASM instance dismount the diskgroup.By default, it is 15 seconds.
  • the heart beat delays are sort of ignored for external redundancy diskgroup.
    ASM instance stop issuing more PST heart beat until it succeeds PST revalidation,
    but the heart beat delays do not dismount external redundancy diskgroup directly.

The ASM disk could go into unresponsiveness, normally in the following scenarios:

Download

+ Some of the paths of the physical paths of the multipath device are offline or lost
+ During path ‘failover’ in a multipath set up
+ Server load, or any sort of storage/multipath/OS maintenance

One way to solve that is to set _asm_hbeatiowait on all the modes of Oracle RAC to a higher value (in seconds) but not higher that 200.

But before that it would be a good idea to take a look at multipathd’s configuration first.

Since Oracle RAC in our case was backed up by EMC VMAX array the following device section is of the most interest:

Oracle Scsi Login

And it might seem that no_path_retry was one part of the problem:

A numeric value for this attribute specifies the number of times the system should attempt to use a failed path before disabling queueing.

Oracle Scsi System

In essence, instead of failing over to the active paths I/O was queued. The negative effect of this option was multiplied by the presence of another option, this time in the default section, called polling_interval which by default is set to 5 seconds. Now you see that I/O was queued by polling_interval*no_path_retry which is 30 seconds in total.

One obvious solution was, as expected, to disable queueing on Oracle voting disks by setting no_path_retry = fail. This was certainly a low hanging fruit but there were more in the details since there are several layers where IO commands issued to a device could experience the timeout:

  • At SCSI layer defined in /sys/class/scsi_device/h:c:t:l/device/timeout.
  • FC HBA’s driver layer (in our case it was qla2xxx). Use modinfo to list the current settings.
  • At dm-multipath or block layer.

Oracle Scsi & Raid Devices Driver Download For Windows 10 Download

The following quote from Redhat’s engineer adds more detailed explanation:

Also, please note that the timeout set in “/sys/class/scsi_device/h:c:t:l/device/timeout” is the minimum amount of time that it will take for the scsi error handler to start when a device is not responding, and *NOT* the amount of time it will take for the device to return a SCSI error. For example if the I/O timeout set to 60s, that means there’s a worst case of 120s before the error handler would ever be able to run.

Oracle Scsi Command

Since IO commands can be submitted to the device up until the first submitted command is timed out, and that may take 60s for first command to get timed out, we could summarize the worst case scenario for longest time required to return IO errors on a device as follows:

[1] Command submitted to the sub path of device, inherits 60s timeout from /sys.

[2] just before 60s is up, another command is submitted, also inheriting a 60s timeout.

[3] first command times out at 60s, error handler starts but must sleep until all other commands have completed or timed out. Since we had a command submitted just before this, we wait another 60s for it to timeout.

[4] Now we attempt to abort all timed out commands. Note that each abort also sends a Test Unit Ready (TUR SCSI command) to the device, which have a 10 second timeout, adding extra time to the total.

[5] depending on the result of the abort, we may also have to reset the device/bus/host. This would add an indeterminate amount of time to the process, including more Test Unit Ready (TUR SCSI command) at 10 seconds each.

[6] Now that we’ve aborted all commands and possibly reset the device/bus/host, we requeue the cancelled commands. This is where we wait (number of allowed attempts + 1 * timeout_per_command) = (5+1 * 60s) = 360s. (**Note: in above formula number of allowed attempts defaults to 5 for any IO commands issued through VFS layer, and “timeout_per_command” is the timeout value set in “/sys/class/scsi_device/h:c:t:l/device/timeout” file).

[7] As commands reach their “(number of allowed attempts + 1 * timeout_per_command)” timeout, they will be failed back up to the DM-Multipath or application layer with an error code. This is where you finally see SCSI errors, and if multipath software is involved, for a path failure.

So the basic idea is that it’s very hard to predict the exact time it would take to failover and it’s worth trying to fiddle with different timeout settings, i.e. already mentioned and fast_io_fail_tmo, dev_loss_tmo from multipath.conf, as well as to look at the problem from the application’s side and update _asm_hbeatiowait accordingly. The question remains, why Oracle decided to set this parameter to 15 sec by default?

Posted on June 7, 2014 at 4:46 pm by sergeyt · Permalink
In: Linux, Oracle