AWS EC2 Linux Boot Problems – Root Volume

Below are the step by step instructions to recover EC2 instance if the system is not able to boot properly.

So far I have seen couple of scenarios where it helped:

–          Corrupted latest kernel ( initrd image is missing )

–          Not able to mount File system because of wrong entry in /etc/fstab and going to emergency

1)      Once we login into AWS console for the instance in question we should see the “Status Checks” failing and reboot won’t help.

2)      Check the “Get System Log”

AWS_EC2_Root_Fix1

AWS_EC2_Root_Fix2

3)      Unfortunately in AWS we do not have console access so the alternative way is to correct the grub entry OR fstab by accessing the disk. To get access to the disk we should stop the instance.

AWS_EC2_Root_Fix3.jpg

4)      Root and boot partition present in /dev/sda1. Click on the EBS ID volume link.

AWS_EC2_Root_Fix4.jpg

5)      We should see the volume is “in-use” state

AWS_EC2_Root_Fix5.jpg

6)      From the Actions – Select “Detach Volume” to detach it from the instance

AWS_EC2_Root_Fix6.jpg

7)      After Detach Volume” we should see the volume in “Available” state.

AWS_EC2_Root_Fix7.jpg

8)      Provision a temporary instance OR if we can use any unused running instance get the instance ID. Temporary instance should be in the same Availability zone as the EBS volume so that we can attach to it.

AWS_EC2_Root_Fix8

9)      Attach volume to the temporary instance with device name /dev/sdb ( if /dev/sdb already used on the temporary instance then pick the next alphabet /dev/sdc and so on ).

AWS_EC2_Root_Fix9

10)   Once attached to temporary instance – Volume status goes back to in-use.

AWS_EC2_Root_Fix10

11)   Login into the temporary instance we should see the disk attached to it.

AWS_EC2_Root_Fix11

12)   Run “lsblk –f” to show the disks with file systems. In this case /dev/xvdb1 is XFS type.

AWS_EC2_Root_Fix12.jpg

13)   Create directory “/tmp/temp” and mount the disk to it.

AWS_EC2_Root_Fix13

14)   Now we can access the files inside the disk

AWS_EC2_Root_Fix14

15)   For kernel issue – Delete the corrupted entry from boot/grub2/grub.cfg

AWS_EC2_Root_Fix15.jpg

16)   For file system issue – Update/Comment entries in fstab apart from basic ( / , swap, /dev/xvdb, Tmpfs ) as shown below. In this particular case the last entry with NFS caused the problem.

AWS_EC2_Root_Fix16.jpg

17)   Once edited – save and come out of directory and unmount /tmp/temp

AWS_EC2_Root_Fix17

18)   We need to rollback the steps now. So Detach the volume from temporary instance

AWS_EC2_Root_Fix18.jpg

19)   Make sure it is available again

AWS_EC2_Root_Fix19

20)   Attach to the original instance as “/dev/sda1”- that is what it used to be.

21)   Start instance.

AWS_EC2_Root_Fix20

22)   Check if the status checks are passing now and we could login.

23)   Delete the temporary instance if you have provisioned.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s