Skip to main content
LPIC-1 Exam 101

103.2 Process text streams using filters Part 2

By February 4, 2014September 12th, 2022No Comments

Key Knowledge Areas

  • Send text files and output streams through text utility filters to modify the output using standard UNIX commands found in the GNU textutils package.

Terms and Utilities

  • cat
  • cut
  • expand
  • fmt
  • head
  • od
  • join
  • nl
  • paste
  • pr
  • sed
  • sort
  • split
  • tail
  • tr
  • unexpand
  • uniq
  • wc

In part 3 we look at join

Working with the command split


The command split(/usr/bin/split) is going to impress, perhaps not at first when we see that when used on its own it can split a large text file into smaller ones defaulting to 1000 lines per smaller document. However, when we see how we can use to control the size of backup files, perhaps as we need to store th0se files on CDROM or USB of particular sizes the power really does start to kick in.

We will start using the default settings to split and we will create a large text file of 5000 lines. This can be done easily in Linux using a for loop and the command seq (/usr/bin/seq).

for l in $(seq 5000) ; do
 echo “Line $l” >> bigfile
done
wc -l bigfile

So we now have a big text file with 5000 lines; if we need to split this into smaller line the default options for split will produce files with 1000 lines in each.

split bigfile

The default suffix is just the letter x as you see and then a two letter code aa, ab etc to identify each file. These files will all have 1000 lines in them as 5000 divides nicely into 5 files. If we want more or less lines per files we can use the -l option with split:

split -l2000 bigfile

The above command will produce 3 files, two with 2000 lines and the last file will have remaining 1000 lines.

Moreover, we can use split and -b option to create smaller backup files. These files are normally binary files and we can split on a specific size. In the example we can work with a backup of the /etc directory which on my Ubuntu server is not larger, or at least the elements that a standard user can read are not large, just 2.9M but perhaps we want files of just 1M.

tar -c /etc | split -b 1m - etc.part.tar_

In creating the backup files at 1M in size we also set the suffix so it is easier to identify the files. The reverse operation will be to restore and this is where the command cat (/bin/cat) can help us again:

cat etc.part.tar_* | tar -x

From the output above we can see now that the etc backup has been restores to the current directory which is the normal behavior and no errors where processed during the restore from the split files.