18.12.19

OracleVM Serial Console Not Working - serial_console (serial_console) main process ended, respawning - Part 2

Back in 2015 I wrote an article that explained the workaround to the serial console problem (continuous errors in /var/log/messages) and serial console not working.

http://kaukovuo.blogspot.com/2015/12/oraclevm-serial-console-not-working.html

The issue resurfaced at some point and I continued having Linux servers that printed continuous error messages  like this:

Dec 18 20:33:49 lb01 init: serial_console (serial_console) main process (30384) terminated with status 1
Dec 18 20:33:49 lb01 init: serial_console (serial_console) main process ended, respawning
Dec 18 20:33:59 lb01 init: serial_console (serial_console) main process (30399) terminated with status 1
Dec 18 20:33:59 lb01 init: serial_console (serial_console) main process ended, respawning
Dec 18 20:34:09 lb01 init: serial_console (serial_console) main process (30411) terminated with status 1
Dec 18 20:34:09 lb01 init: serial_console (serial_console) main process ended, respawning
 
The fix for this modifying the /etc/udev/rules.d/50-udev.rules
 
Change hvc0 to ttyS0, like this:
KERNEL=="hvc0",                 SYMLINK+="serial_console"
to
KERNEL=="ttyS0",                 SYMLINK+="serial_console"
 
Save the file and reboot the server.

10.11.17

OracleVM 3.3.4 and Data Corruption When Cloning VM from Template

We faced quite serious issues once trying to create a virtual server from a template that resided in NFS mount, target repository was on iSCSI storage.

The problem was that for some reason OracleVM 3.3.4 kernel 3.8.13 started corrupting the image while cloning the virtual server from template. Symptoms were that after the clone operation everything looks good from OracleVM Manager point ot view but when trying to startup the server, it fails with error stating there is no bootable operating system.

During the cloning operations there were huge amount of following errors in the /var/log/messages. The errors were the same despite I changed the utilility server to be different, so this is not hardware issue:

Nov  9 18:35:17 vs9 kernel: sd 5:0:0:0: [sdd] CDB:
Nov  9 18:35:17 vs9 kernel: Write(10): 2a 00 26 bf f0 00 00 0a 00 00
Nov  9 18:35:17 vs9 kernel: sd 5:0:0:0: [sdd] Invalid command failure
Nov  9 18:35:17 vs9 kernel: sd 5:0:0:0: [sdd]
Nov  9 18:35:17 vs9 kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov  9 18:35:17 vs9 kernel: sd 5:0:0:0: [sdd]
Nov  9 18:35:17 vs9 kernel: Sense Key : Illegal Request [current]
Nov  9 18:35:17 vs9 kernel: sd 5:0:0:0: [sdd]
Nov  9 18:35:17 vs9 kernel: Add. Sense: Invalid field in cdb
Nov  9 18:35:17 vs9 kernel: sd 5:0:0:0: [sdd] CDB:
Nov  9 18:35:17 vs9 kernel: Write(10): 2a 00 26 bf fa 00 00 0a 00 00
Nov  9 18:35:17 vs9 kernel: JBD2: Detected IO errors while flushing file data on dm-3-617
When searching for the explanation, looks like this is an issue with using iSCSI and jumbo frames with certain 3.9 kernel versions. Could be that this is an issue with OVM 3.8 kernel as well.What makes this particularly nasty is that we’ve made several copies of virtual servers for backup purposes and there is no guarantee that those copies are valid and functional any more.

Possible Solution

To troubleshoot the fix I decided to upgrade the whole OracleVM park to the latest OVM 3.4.4 that uses Kernel 4.1.

After upgrading the all the pools and OVM Manager to 3.4.4 looks like we got rid of this nasty behaviour.

Tried exactly the same way ot cloning, using the same servers, no errors and the cloned virtual server works just fine.

Recommendation

I strongly recommend to upgrade to OracleVM 3.4.x as soon as possible if you are using 3.3.x, iSCSI and jumbo frames AND you are seeing these errors.

Check your OracleVM servers, if you see any of these errors in /var/log/messages, you might have data corruption issues in the images.

6.7.17

RedHat 6.9 Update Breaks Oracle Reports

For those that have Oracle Reports running, be aware. Updating the latest RedHat 6 or 7 updates will break the Oracle Reports execution.

For example upgrading to the latest RedHat 6.9 caused all the reports to be failing with signal 11 or signal 6.

This is a known issue (referring to Oracle technical note 2280616.1).


For fixing edit the reports.sh file

Reports 11.1.2.x:

File: INSTANCE_HOME/config/reports/bin/reports.sh

Add in the last line:
REPORTS_JVM_OPTIONS="-Xss2M"; export REPORTS_JVM_OPTIONS

Reports 12.2.1.x:

File: DOMAIN_HOME/reports/bin/reports.sh

Add in the last line:
REPORTS_JVM_OPTIONS="-Xss2M"; export REPORTS_JVM_OPTIONS

After modifying, restart the Reports server.

3.7.17

OVM Guest Linux LVM2 Disk Mounting

There might be need some time to mount OVM guest server disk images directly from OVM Server and modify some settings e.g. that prevent the server from booting or change passwords.

Typically doing this is quite straight forward by setting up a loop device and mounting the wanted partition. If the target partition is LVM2 partition, this becomes a bit more complex in OracleVM environment.

The problem is that by default OracleVM server /etc/lvm/lvm.conf has filtering enabled to prevent discovering loop device LVM2 devices.

This article describes the steps to perform to get a LVM2 volume mounted and data there changed off-line.

If you are unsure what you are doing, please make a backup of the virtual disk and lvm.conf you are going to change, before you proceed with following actions.

Instructions below expect that an experience Linux/OVM administrator knows what she/he is doing. I’m not going into details what a root will do after issuing the chroot command. ANY CHANGES ARE AT YOUR OWN RISK. TAKE GOOD BACKUPS ANYWAYS.

Preparation: modify /etc/lvm/lvm.conf

In order for OVM server to able to scan the loop devices /etc/lvm.conf needs to be modified.

# 30.6.2017 Harri Kaukovuo, modify the preferred_names
#preferred_names = [ "^/dev/mpath/", "^/dev/mapper/mpath", "^/dev/[hs]d" ]
preferred_names = [ ]

# 30.6.2017 Harri Kaukovuo, uncomment
filter = [ "a|.*/|" ]

# 30.6.2017 Harri Kaukovuo, comment out this line
#global_filter = [ "r|.*/|" ]

Mount the Virtual LVM2 Disk

Find the next free loop device:

losetup –f

By default OVM Server has max 10 loop devices. You might be running out of loop devices, which you can work around by either shutting down all the guest VM servers or increasing the loop devices by adding following line in /etc/rc.local and rebooting the OVM server:

# 28.6.2017 Harri Kaukovuo
/sbin/MAKEDEV -m 32 /dev/loop

Please note that above step is only needed if you ran out of loop devices.

Anyways, when you have a free loop device, in my example I have /dev/loop9 as the loop device, you can proceed with following step which is setting up the loop device to point to the virtual disk. In the example below I have retrieved the disk image file name from OVM Manager console:

losetup  /dev/loop9 /OVS/Repositories/0004fb000003000088c2307002d1b442/VirtualDisks/0004fb0000120000ba5aa22cb02675b7.img

Read partition tables from the loop device and create device maps with kpartx:

kpartx -av /dev/loop9

The output is something like:

[root@myovm01 etc]# kpartx -av /dev/loop9
add map loop9p1 (249:1): 0 401562 linear /dev/loop9 63
add map loop9p2 (249:2): 0 142898175 linear /dev/loop9 401625


Perform volume group scan: by issuing command:

vgscan

Output is like:

[root@myovm01 etc]# vgscan
   Reading all physical volumes.  This may take a while...
   Found volume group "VolGroup00" using metadata type lvm2

Activate volume groups by issuing command:

vgchange –ay

Output is like:

[root@myovm01 etc]# vgchange -ay
   4 logical volume(s) in volume group "VolGroup00" now active


Show logical volumes:

[root@myovm01 etc]# lvscan
   ACTIVE            '/dev/VolGroup00/LogVol03' [42.03 GiB] inherit
   ACTIVE            '/dev/VolGroup00/LogVol02' [6.06 GiB] inherit
   ACTIVE            '/dev/VolGroup00/LogVol00' [4.00 GiB] inherit
   ACTIVE            '/dev/VolGroup00/LogVol01' [16.00 GiB] inherit


Mount the wanted  logical volume. In this example case I know already that LogVol03 is the root partition and I want to change something there.


[root@myovm01 etc]# mount /dev/VolGroup00/LogVol03 /mnt/virtualdisk
[root@myovm01 etc]# df -H
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2              53G  2.8G   48G   6% /
tmpfs                 3.2G     0  3.2G   0% /dev/shm
/dev/sda1             500M  145M  325M  31% /boot
none                  3.2G  213k  3.2G   1% /var/lib/xenstored
/dev/mapper/361866da0905943002027220113a0a9c8
                       8.4T  3.0T  5.5T  36% /OVS/Repositories/0004fb000003000088c2307002d1b442
/dev/mapper/VolGroup00-LogVol03
                        45G   33G  9.4G  78% /mnt/virtualdisk


Use chroot to change the root directory, if you need to change something there:

chroot /mnt/virtualdisk

(do your stuff here)

Get out of chroot by issuing “exit”

Unmount the disk after use:

umount /mnt/virtualdisk

Deactivate the volume group:

[root@myovm01 ~]# vgchange --activate n VolGroup00
   0 logical volume(s) in volume group "VolGroup00" now active

Delete the partition device mappings:

kpartx -dv /dev/loop9

Output is something like:

[root@myovm01 etc]# kpartx -dv /dev/loop9
del devmap : loop9p2
del devmap : loop9p1


Delete the loop device mapping:

losetup -d /dev/loop9

Now there should not be any LVM2 mappings found, also the loop device should be free:

[root@myovm01 ~]# pvscan
   No matching physical volumes found
[root@myovm01 ~]# vgscan
   Reading all physical volumes.  This may take a while...
[root@myovm01 ~]# losetup -f
/dev/loop9

After unmounting the disk, startup the guest linux and enjoy the changes.

13.6.17

OracleVM Server Update Disabled

One of the tasks to setup the OracleVM platform is to setup the YUM repository for updating the OracleVM servers from OracleVM Manager console.

If your OVM server sits behind the firewall and cannot connect the Oracle public YUM server without proxy setup you might have a situation where your OracleVM server context menu shows up like:

image

Update menu option is disabled even after you’ve setup the YUM server and enabled it.

Problem might be that your OVM Server needs to connect the public YUM server using company proxy server.

To fix this:

1. Login to OracleVM server as root

2. Edit /etc/yum.conf, configure to setup your company http proxy. Add following lines:

enableProxy=1
httpProxy=http://proxy.acme.com:8080
proxy=http://proxy.acme.com:8080

3. Disable and Enable the YUM repository setup in OracleVM Manager.

image


After this you should have the “Update” menu item enabled:

image

4.2.16

OracleVM Manager Console Failing with ERR_SSL_VERSION_OR_CIPHER_MISMATCH

Google Chrome version 48 dropped out the support for RC4 algorithm. This causes problems with OracleVM Manager 3.3 that uses RC4 as one of the default cipher suites.

The error is occuring once you try to access OVM Manager console. You will get
“ERR_SSL_VERSION_OR_CIPHER_MISMATCH”

To fix this, you need to add a new cipher suite to the OVM Manager weblogic configuration file.

Steps:
1. Login as oracle –user
2. cd /u01/app/oracle/ovm-manager-3/domains/ovm_domain/config
3. Back up the config.xml (e.g. copy it to config.xml.2016-02-04 or something)
4. Edit config.xml, add “<ciphersuite>TLS_RSA_WITH_AES_128_CBC_SHA</ciphersuite>” to the end of the AdminServer ciphersuite listing.

Should look something like this:









5. Restart the OVM Manager server as root:
service ovmm restart

After this you should be able to connect to OVM Manager console.

If you tried to use AES256 ciphersuite instead of AES128 you will get:

"java.lang.IllegalArgumentException: Cannot support TLS_RSA_WITH_AES_256_CBC_SHA with currently installed providers"
This is due to export restrictions, so you should use AES128 if you haven't updated the needed jars to support AES256.

4.12.15

OracleVM Serial Console Not Working - serial_console (serial_console) main process ended, respawning

 

After upgrading to the latest OracleVM 3.3.3 and updating all Oracle Linux 6 guests to the latest versions I started to see problems with OL6 serial console interface. First of all, the serial console didn’t seem to work at all, I could not connect to the console via OracleVM Manager serial console. Secondly there started to be continuous error messages in the OL6 /var/log/messages file:

Dec  4 06:30:34 atlassian init: serial_console (serial_console) main process (4704) terminated with status 1
Dec  4 06:30:34 atlassian init: serial_console (serial_console) main process ended, respawning

At the same time all Oracle Linux 5 servers worked fine and also few OL6 servers as well, but most of the OL6 servers suffered from this.

Those OL6 servers that worked fine had symbolic link in /dev/ where “serial_console” pointed to hvc0 device. Those OL6 servers that had problems this symbolic link points to ttyS0.

Don’t know exactly what is the root cause of this problem, but looks like there are two symlink definitions in kernel device udev rules and in some servers they point to hvc0 and some servers to ttyS0.

My fix to this problem was to edit the udev rules

vi /etc/udev/rules.d/50-udev.rules

Original content:

KERNEL=="ttyS0",                SYMLINK+="serial_console"
KERNEL=="hvc0",                 SYMLINK+="serial_console"

Remove the ttyS0 line so that the content looks like:

KERNEL=="hvc0",                 SYMLINK+="serial_console"

Save the file and reboot the server. After this serial console should work ok and no extra error messages should be appearing in the messages log file.