I am sure if you have not already come across tar and gz files within Linux it will not be long before you become acquainted with each other. Of course, these file make up just a part of what we mean when we talk about archiving files from the Linux command line. Spending just a little time with Linux you will soon come across TAR files of some type, be they TAR, TGZ , TAR.GZ or some other format. The TAR archive, or Tape Archive, file is an oldie but a goodie having passed across from UNIX into Linux. The archive itself is a single file that can represent many files.
Compress or Not Compressed
These tar files do not have to compressed but they often are. However, even if it is not zipped up it will often consume less disk space than the files stored individually. Consider the following screenshot where we first look at the size of the directory and then create a tar archive file, uncompressed, and view the size of the tar file: it is smaller:
The output shows the directory to be 52K and the tar archive to be just 20K and no compression has been used. This relates to the way the filesystem uses blocks of disk space. Each new file has to start with its own new block. Often the block size is 4KB; this means for each 1KB file, for instance, will consume 4KB of disk space. If we combine these files into one file, a TAR file, then less space can be used to store the same amount of data.
The three main options we have with tar are:
- -c : create a new archive
- -x : extract an archive
- -t : verify or test and archive
Creating a TAR file
From the previous graphic we can see the creation of the TAR file. We normally include .tar as the last four characters of the file name so we easily identify this type of file. The option -f specifies the file name and must be followed by the same.
tar -cf labs.tar labs
In this example we archive the labs directory within current directory. The target file for the archive is: labs.tar, also within the current directory. The source directory labs remains intact and unaffected by the operation other than updating the last accessed time attribute of each file included in the archive. In order to backup a file it has to be read, hence the last accessed time of each file we archive will be updated to the time of the backup.
Viewing a TAR file
Once we have created the file we can verify the file contents with the -t option:
tar -tf labs.tar
Incidentally, a large TAR file may be read directly with the command less , this allows for the file contents to be paged through without the explicitly use of tar and less together.
Extracting a TAR file
To extract the complete archive the command is simple and makes use of the -x option:
tar -xf labs.tar
The files will expand within the current directory unless the option -P is used both when the archive is created and expanded, in which case the files are expanded to the full path of the original files. If there are concerns that you may overwrite files then a couple of options exist that may help
- -k : prevents existing files being overwritten
- –keep-newer-files : will not overwrite if the target file is newer than the archive file
Should we want to extract only a single file or certain files from the archive we could use code similar to the following:
tar -xf labs.tar labs/file.sh
This would extract just the single file from the archive, we can use the -t option if we need to confirm the path too the file in the archive.
Compressing the archives
These archives can be compressed and uncompressed within the same TAR process. Using the options:
- -z : the use gzip for compression and gunzip for decompressing the file
- -j | uses bzip2 for compression and bunzip2 for decompressing.
tar -czf labs labs.tgz tar -xzf labs.tgz
The first command creates the zipped archive and the second extracts the archive. The option -z must be used in both cases and also with -t for viewing. If we had used the option -j in the first instance then -j must be used to access the archive in future. A good naming standard is useful here so we know how to open or view the file; commonly these endings are used for file names:
- .tar : indicates a uncompressed file
- .tar.gz or .tgz : indicates a file compressed with gzip
- .tar.bz2 or .tbz2 | indicates a file where bzip2 was used to compress the archive
Ultimately, though, we can use the command file (/usr/bin/file) to identify the type of archive we have:
This will identify the compression used, if any and confirm that the file is a TAR archive.