Combined Utilities for Beowulf (cub)

 

Using cub to Install a Beowulf Cluster

The cub CD image includes all the software you need to add to the RedHat Linux distribution and to set up and install an IA32 or Alpha Beowulf cluster from scratch. Running the cub installation script configures your Beowulf cluster management node and installs the LUI and SCMS utilities. At the completion of the script, these open source software tools have access to initial cluster configuration information.

Before Installing Cluster Software

Before installing the cluster software, you need to

  • Read the Configuration Notes
  • Make sure pathnames are set up correctly
  • If applicable, ensure that Myricom Myrinet sources are installed as described below.
  • Create boot floppies for the compute nodes if the cluster is IA32-based
  • Collect compute node MAC addresses
  • Install Redhat 7.1 or 7.2 on the management node

Configuration Notes

Disks
  • By default, cub configures exactly one disk on a compute node; additional disks are ignored.
  • You can change the configuration (for all compute nodes) by editing the disk table when cub offers you that option. See the mkdskp(1) man page for information on the format of the disk table.
  • By default, cub assumes that the compute node disks reside on the same controller and disk as the root partition on the management node. For example, if / is mounted on /dev/hda1, cub will set up the disk table so the compute nodes will all use /dev/hda. Beware! On two-disk systems, it is common to find /boot mounted on, say, sda1, while / is mounted on sdb1. If your management node is set up this way, and your compute nodes have only one disk, you MUST edit your disk table to reflect the correct controller.
  • To cub, the world has three families of disks: IDE, SCSI, and RAID. cub assumes that all of the compute nodes have disks from the same family, but does not require that they be the same size, or be made by the same vendor. Thus, you might have a set of compute nodes that use various SCSI disks from 9 to 80 GB, using various SCSI controllers from QLogic, NCR, and DEC.
  • 4GB is the smallest practical disk for a compute node. If your application will use or store data locally on the compute node disks, you will need appropriately larger disks.
 
 
Networks
  • At the absolute minimum, each node must have one Ethernet interface. If the management node has but one interface, you must ensure that no other system on your local network is using an address in the blocks 10.1.2.* or 10.1.3.*
  • It is highly desirable, and the cub documentation recommends, that the management node have two Ethernet interfaces: eth1 is connected to your site's LAN, and eth0 connects to a cluster-private network between your compute and management nodes.
  • If your compute nodes all have two Ethernet controllers, you can configure a second private network between them that is reserved for MPI message-passing traffic between them. This second network should not be connected to any other network, nor should it be connected to the management node!
  • You can also use a Myrinet network for the private MPI network used by the compute nodes. (Note that there is no benefit to connecting the Myrinet network to the management node at this time; you'll just waste an expensive interface card.)
 
CPUs
  • For x86-based systems, cub assumes a minimum of Pentium II processor
  • For Alpha, cub accepts any Alpha processor
 
KVM (keyboard, video, mouse)
  • The management node must support X11 (almost assuredly XFree86). Without X, the gui tools installed by cub would be pointless!
  • The compute nodes do NOT need graphics interfaces, keyboards, or mice, so long as some acceptable surrogate is available:
    • For Alpha systems, cub will be happy to use a terminal server connected to the compute nodes' serial ports to start and stop the systems.
    • On x86 clusters, or Alpha clusters without terminal servers, you'll probably want to use a KVM switch to access the compute nodes when they are being configured, when they are offline, and so on.
 

Deciding on a Kernel

By default, the script builds a kernel for the compute nodes from the same sources you use to build the management node's kernel. If you want the compute nodes' kernel to be built from different sources, install the sources under the directory /usr/src with the name or logical link linux, so that /usr/src/linux points to them.

The script looks for the kernel sources in the following order:

  1. /usr/src/linux
  2. /usr/src/linux-a.b.c-d
  3. /usr/src/linux-a.b.c
  4. /usr/src/linux-a.b

where linux-a.b.c-d is the kernel version as found in /proc/version.

Myricom Myrinet Sources

If your cluster will communicate via Myrinet and you want to install a version of Myricom's GM or MPICH other than the version included on the cub CD, make sure the appropriate tar balls reside in a currently-mounted directory that the script will be able to access.

Creating Boot Floppies for Compute Nodes (IA32 Clusters Only)

If your cluster is IA32-based, create a separate boot floppy for every compute node in the cluster. After installing the cluster software on the management node, you will use these floppies to boot the individual compute nodes. (Alpha compute nodes can be booted over the network so boot floppies are unnecessary.)

To create the floppies, go to the web site http://rom-o-matic.net/, which can be used to dynamically generate Etherboot ROM images.

  • To create a boot floppy on a UNIX or Linux system, as root, use the following command line:

    # dd of=/dev/fd0 if=etherboot.floppy-image

  • To create a boot floppy on an MS-DOS-based system, find the program called rawrite.exe on your Linux distribution, copy it to your C: drive, mount the cub CD, run rawrite.exe, and follow the prompts.

    On a RedHat kit, look for d:\dosutils\rawrite.exe on the first CD.
    On a SuSE kit, look for d:\dosutils\rawrite\rawrite.exe on the first CD.

Collecting MAC Addresses

To configure the cluster, the cub script requires the Ethernet MAC address of every compute node in your cluster. You must have these MAC addresses ready to enter before you run the script.

On an IA32-based cluster, follow these steps:

  1. Power off all of the compute nodes.
  2. Boot cnode0 with an etherboot floppy.
  3. Watch the cnode0 display for a message like

    Probing...[EEPRO100] Ethernet addr: 00:50:8B:72:A5:91

    Write down the MAC address, which is the numeric string starting with 00:50. When you eventually run cub, it will ask you to enter the MAC address for every compute node in the cluster.
  4. Power down cnode0.
  5. Repeat steps 2 through 4 for each compute node.

On an Alpha EV67-based cluster, follow these steps:

  1. Power up a compute node.
  2. At the SRM >>> prompt, enter show device ewa0.
    The MAC address is the string with lots of colons in it; write it down and save it for later.
  3. Power down the node.
  4. Repeat steps 1 through 3 for each compute node.

Installing RedHat

Install Redhat on the system you intend to be the management node in your Beowulf cluster. As you run this installation, make the following selections:

  • For Install type, select Custom System.

  • Set up the cluster-private network on eth0 and the world
    network on eth1, as follows:
    • For eth0:

      IP address: 10.1.2.1
      Network: 10.1.2.0
      Netmask: 255.255.255.0
      Nodename: mgtnode


    • For eth1, set it up as appropriate for your local network. You should also set the gateway and name servers as appropriate for your local network.

      Note: You must set your local system name as mgtnode on this system, even though it probably has been assigned a different name on your local (eth1) network.

  • Firewall setting: If you select something other than No firewall, you must also choose:
    • customize
    • trusted devices: eth0

  • If you will be accessing your cluster from elsewhere on the net, you will probably want to select
    • allow incoming: SSH
    • allow incoming: WWW(HTTP)

  • For Package Group Selection, choose at least
    • X Window System
    • KDE
    • Mail
    • Networked Workstation
    • NFS Server
    • Web server
    • Emacs
    • Development
    • Kernel development
    • Utilities

  • Check the select individual packages box, then add
    • Applications/Internet/ tcpdump
    • Applications/Internet/ tftp
    • System environment/Daemons dhcp
    • System environment/Daemons tftp-server

  • Build and install the system.

You're now ready to run the cub script.

Running the cub Installation Script

Mount the cub CD and execute the installation script:

# mount /mnt/cdrom
# /mnt/cdrom/cub

  • When prompted to do so, enter the compute node MAC addresses you have collected. Note that each of the colon-separated octets must be specified with 2 digits. Thus, the MAC address a:0:ca:49:3:5:d3:cc must be entered as 0a:00:ca:49:03:05:d3:cc. Also note that the MAC address response is not case-sensitive.

  • When prompted, edit the disktable file, if necessary.

  • When prompted, specify whether to install the Myrinet software.
    • If yes, specify a directory mounted on the local system that cub can access to find the gm and mpich tar balls. The default is the current cub CD.

The installation procedure will now proceed unattended for 45 to 60 minutes, depending on the speed of your system. See the Sample cub Script Dialogue in Figure 1.

Booting the Compute Nodes

After the cub script has completed, the client nodes are ready to be booted:

  • For an IA32-based cluster, use the etherboot floppies you created earlier to boot each of the compute nodes.
  • For an Alpha-based cluster, at the SRM prompt for each compute node, type
    boot ewa0
    -fl "root=/dev/nfs ip=bootp"

When each of the compute nodes displays the message Installation complete - reboot or press [ENTER] to log in, power down the node, remove the floppy from an IA32 node, and restart the system.

Your cluster installation is now complete!

Running the mpich-start Script

The cub script itself creates another script on the server node called mpich-start to start up the mpich environment. The environment set up depends on

  • Whether the cluster uses Myrinet and
  • Which nodes are running at the time you run mpich-start.

To run the script, enter the following command on the server node:

/tftpboot/mpich-start

The command takes no arguments.

You must run this script after you've configured and installed the server and compute nodes and after you've rebooted the compute nodes to run the newly installed systems. You must also run it any time you drop nodes from or add nodes to the cluster.

When you run mpich-start, it creates the mpich conf file, which includes a list of the available nodes and the number of CPUs on each node. On a Myrinet cluster, mpich-start also initializes the Myrinet devices and runs the mapper.

Sample cub Script Dialogue

In Figure 1, the sample dialogue installs an IA32 cluster made up of a management node and three compute nodes, which use Myrinet for passing high-speed MPICH or PVM application messages.

Figure 1 Sample cub Script Dialogue

[root@mgtnode /root]# /mnt/cdrom/cub
cub 1.3
Copyright (C) 2002 by Compaq Computer Corporation

cub comes with ABSOLUTELY NO WARRANTY. This is
free software, and you are welcome to redistribute it
under certain conditions; see the file COPYING in the
cub kit.

Check /tmp/cub-log for errors if this procedure fails.

Found 2 Ethernet devices: eth0 eth1
The cnode kernel will be built from the 2.4.16-1 sources in /usr/src/linux

What additional network will you use for passing high-speed
application messages (MPICH or PVM) between compute nodes?

1) None
2) Myrinet
3) Second Ethernet network

Enter 1, 2, or 3: [1]

...Setting up the private cluster network...

The time has come to enter MAC addresses

You will have a chance to edit the MAC addresses later
if you make a mistake entering them here.

Enter a MAC address or <ENTER> to end: 00:50:8b:72:a5:91
Enter a MAC address or <ENTER> to end: 00:50:8b:35:b6:c8
Enter a MAC address or <ENTER> to end: 00:50:8b:b3:82:21
Enter a MAC address or <ENTER> to end:
Would you like to edit the MAC address list? [no]

...3 compute nodes defined...

...Building the disk table...

Your new disktable:

/dev/sda1 ext2 30 m n /boot
/dev/sda2 swap 1000 m
/dev/sda3 ext2 * m n /

Would you like to edit this file? [no]

Where is the MPICH source tar file? [/mnt/cdrom/mpich-nogm-1.2.2.2.tar.gz]

+++ The next steps should run for 45-60 minutes
+++ before we need your help again. You can watch
+++ the "cub log" window to confirm that
+++ something is happening.


...installing bootpd...

...configuring SCMS...

...modifying system control files...

...Testing rsh...

...Building LUI...

...Setting up NFS...

...Installing mknbi...

...Starting kernel configuration for NFS booting...

...Starting build of kernel 2.4.16-nfs...

...Starting kernel configuration for the compute nodes...

...Starting build of kernel 2.4.16-cnode...

...Building ram disk...

...Copying files to 12MB ram disk...

...Skipping GM...

...Building MPICH...

...Building Perl-Tk...

...Now setting up LUI...

...Final preparation and setup...

...cub finished successfully...
[root@mgtnode /root]#