How to make and download a raw Linux HDD image remotely using ssh and dd in Linux.
I wanted to make a backup of my Digital Ocean VPS. They offer excellent service (so far) and have nice backup options. However, they don’t allow you to download your backup image. Since it’s my data, I believe its correct place is within my hands. I wanted to clone the server and run it as a virtual machine on my end. So I started looking for options.
Note: I’m not responsible for the loss of your data (if it happens). Do this on your own risk.
After lots of searching I’ve found several ways to achieve this. This article sums most of them:
Disk2vhd. It allows you to create the image without shutting down the system. It uses the Windows Volume Snapshot feature, which obviously works only on Windows. My VPS uses Linux, so this is out of question.
Parallels Transporter Agent. Don’t know much about it, but it’s for PC and Mac only. Also out of question.
VMware vCenter Converter. You have to have a VMWare ESXi server deployed somewhere. Using the vCenter Converter, you connect to the machine you want to clone using ssh (you provide root credentials for that, and obviously the machine must have an accessible ssh server installed). It then pulls the HDD data from there and creates a new virtual machine image on the ESXi server using these data. You can then use the VMware vSphere client to connect to the ESXi server and download the disk image.
You don’t have to shutdown the machine you want to clone in order for this to work. This could be a feature, but it’s also dangerous. If you have some service running that modifies the disk data (like an active database server), your backup data could end up being inconsistent/corrupt.
Anyway, this method requires installing lots of VMware programs, and VMware doesn’t allow you to download these programs without creating an account. When you create an account (and write a LOT of information about yourself) you’ll be able to download TRIAL versions of these programs that expire after 60 days. It’s funny that they don’t tell you that they will expire on the download pages. I had to dig it out from their site. So this is also out of question.
But since VMware vCenter Converter could do it through ssh, then I know for sure that I can do it as well. This lead me to using dd.
I investigated this method after stumbling upon this post. Just like vCenter Converter, this method allows you to take a backup without shutting down the server. However, in order to be in the safe side, we’ll shutdown the services that actively modify the disk, mount it as read-only, pull the backup and then restart the machine.
ssh into the Linux machine you want to clone, then stop the services,
sudo service apache2 stop sudo service mysql stop # ...etc
Remember not to stop the ssh server. We need that running.
The disk image size can be greatly reduced if we overwrite the empty disk-space with zeroes. To do this, we’ll create a huge zeroes file the occupies all the remaining disk space then delete it. This will render the system out of disk space for a while, since deleting such file will take some time. To avoid having the system run out of disk-space for a long time, we’ll create a small zeroes file before creating the large one then delete it before deleting the large one. Lets do that,
Create a small zeroes file,
dd if=/dev/zero of=zero.small.file bs=1024 count=102400
Consume all the remaining disk space in another zeroes file,
cat /dev/zero > zero.file
Make sure all the changes were actually carried out (not really necessary, but just in case),
Finally, remove the small and the huge files,
rm zero.small.file rm zero.file
Now lets mount the disk as read only,
sudo -s echo u > /proc/sysrq-trigger exit
sudo -s and
exit are because there’s a redirect in the
echo command, so a regular
sudo wouldn’t work.
Next, create and download the HDD image. Since this is probably going to take some time (because my internet connection is pathetic) during which our server is inaccessible, I’ll not download to my local machine. Instead, I’ll create another droplet (VPS) with twice the storage capacity just to perform this clone. When we’are done we’ll delete the new droplet altogether. With Digital Ocean’s pricing, this should cost only a few cents.
Important remark: try creating the new droplet in the same data-center as the droplet you want to clone. Also, enable and use private IPs. This is bound to give you the best network performance and is not counted against your network threshold.
ssh into the new droplet, install pv to be able to see progress,
sudo apt-get update sudo apt-get install pv
then it’s time to actually download the image. Use
/dev/sda depending on your hosting configuration.
sda is your disk when your system is not using virtualization aware drivers or is really using a physical disk.
vda is when the system is aware that the disk is a virtual drive. In this case, the system can perform much better than when emulating a physical disk. If both interfaces are exposed, then definitely use the
vda for performance.
If you are on Digital Ocean, you probably use
vda by default,
ssh root@[machine to clone ip (private IP if possible)] "sudo dd if=/dev/vda bs=16M | gzip -1" | gunzip | pv -W | dd of=diskImage bs=16M
When this is done, the image should have been successfully downloaded. You might be telling yourself that you can remove the
gunzip part and keep it compressed. Don’t! The
gzip command above doesn’t compress the entire file, it compresses the block of data the
dd feeds it. The result is a compressed block with size that’s not fixed. If we didn’t uncompress each block immediately, we’ll end up with a large file of variable sized compressed blocks chunked together that we have no means to separate to uncompress.
We don’t need to stall the cloned machine any longer. Reboot it using:
sudo reboot -h now
Now our business with the machine we want to clone is over. Time to compress and download the image,
To heavily compress the image,
gzip -c -9 diskImage | pv > diskImage.gz
Or if you have a multi-core configuration and you want to use multiple cores to improve compression speed:
sudo apt-get install pigz pigz -c -9 diskImage | pv > diskImage.gz
That’s why we chose a droplet with twice the disk space, because at some moment both the image and a compressed copy of it will reside on the same disk.
Now that we’ve compressed the image, we want to get rid of the droplet as soon as possible so that we are not charged for having it unnecessarily. Digital Ocean doesn’t charge you just for running the droplet, but also for keeping it. The compressed image is probably a small fraction of the cloned HDD, so we’ll just move it to our original droplet.
pv, so I’m going to use them for this transfer as well! From the original droplet,
ssh root@[machine with compressed image ip (private IP if possible)] "dd if=[zipped image file path]/diskImage.gz bs=16M" | pv -W | dd of=diskImage.gz bs=16M
We’ve finally pulled the compressed image file from the new droplet and we have no more use for it. Delete it immediately. Now download the compressed image to your local machine using:
scp [username]@[droplet address/ip]:[path to compressed image]/diskImage.gz [local directory to copy into]
We could have used
dd, or any other method of your choice. After the download is finished, don’t forget to delete the image from the server. The whole process (except downloading the final image to my final machine) took less than an hour on my humble 20GB server.
Restoring the backup image
If you want to restore the image to the droplet or another machine (a VM for example), make sure the machine has enough disk space to hold the uncompressed disk image, as different people refer to different number of bytes when saying GigaByte (which is super annoying). Then boot the machine using a LiveCD (or boot in the recovery options provided by your hosting service provider). Then perform the following,
DANGER! If you execute the next command in the wrong terminal window you may end up overwriting your local machine’s disk. Double check where it will execute before pressing enter!
DANGER! This procedure, while correct in theory, failed to restore a real DigitalOcean droplet. I didn’t test enough to know the cause of the problem.
ssh [user]@[machine with the image ip] "gunzip -c [path to the compressed image]/diskImage.gz" | pv -W | sudo dd of=/dev/vda bs=16M
Also note the usage of
sda depending on your situation.
This way, the file is uncompressed on the machine with the image and the uncompressed data is transferred on the network and written directly to the target machine’s disk. We had to do this because we are running a LiveCD and we don’t have a place to store the image and uncompress it first. It works nicely when creating a virtual machine on the same physical machine, since there’s no real network traffic. However, if you are deploying to a droplet, you should probably create a temporary droplet to download and uncompress the image first then deploy from there instead to avoid transferring a huge image file.
If you are deploying back to the same droplet, then your work is done. However, if you are deploying to another machine, note that the VPS image probably uses some sort of bootloader and network configuration magic that makes it work best in the data-center. To configure it to work nicely on your VM or local machine, proceed to fix boot and network settings.
Fixing Boot and Network Settings
Mount the newly written drive,
sudo mount /dev/vda1 /mnt # or sda instead of vda sudo mount -o bind /proc /mnt/proc sudo mount -o bind /dev /mnt/dev
Change root to the new mounted volume,
In my case, it appears that DigitalOcean droplets’ file system couldn’t be mounted this way. Just boot the machine (it might take some time. You might want to disable/unplug network devices while booting) then carry on with the following:
Install grub (you probably don’t need to do so, but just in case),
Set network mode to DHCP,
sudo nano /etc/network/interfaces
Remove any network interfaces that don’t exist, and modify the existing interfaces to use DHCP:
auto eth0 iface eth0 inet dhcp
as pointed out here. The last thing to do is to get rid of
sudo dpkg-reconfigure cloud-init
None: Failsafe datasource. Now we are done!
Creating a Virtual Machine Disk Directly
If we are deploying to a virtual machine, there are more ways to do it. First, uncompress the image file, then use one of these:
qemu-img convert -O vmdk diskImage diskImage.vmdk
if you have qemu installed, or,
VBoxManage convertfromraw diskImage diskImage.vmdk --format VMDK
if you have Virtual Box.
While we’ve done the above to a Linux machine, the same procedure works for windows machines if you use a LiveCD. Simply boot from a LiveCD and perform the same procedure on the Windows disk drive. However, there’s a number of reasons why you shouldn’t try to migrate a windows installation to a different machine. Making backups is fine, but deploying to other machines is troublesome. Read more about it here.
That’s it folks!