| |
|
 |
Using cub to Install a Beowulf Cluster
The cub CD image includes all the software you need to add to the
RedHat Linux distribution and to set up and install an IA32 or Alpha
Beowulf cluster from scratch. Running the cub installation script
configures your Beowulf cluster management node and installs the
LUI and SCMS utilities. At the
completion of the script, these open source software tools have
access to initial cluster configuration information.
Before Installing Cluster Software
Before installing the cluster software, you need to
- Read the Configuration Notes
- Make sure pathnames are set up correctly
- If applicable, ensure that Myricom Myrinet sources are installed
as described below.
- Create boot floppies for the compute nodes if the cluster is
IA32-based
- Collect compute node MAC addresses
- Install Redhat 7.1 or 7.2 on the management node
Configuration Notes
| Disks |
- By default, cub configures exactly one disk on a compute
node; additional disks are ignored.
- You can change the configuration (for all compute nodes)
by editing the disk table when cub offers you that option.
See the mkdskp(1) man page for information on the format
of the disk table.
- By default, cub assumes that the compute node disks reside
on the same controller and disk as the root partition on
the management node. For example, if / is mounted on /dev/hda1,
cub will set up the disk table so the compute nodes will
all use /dev/hda. Beware! On two-disk systems, it is common
to find /boot mounted on, say, sda1, while / is mounted
on sdb1. If your management node is set up this way, and
your compute nodes have only one disk, you MUST edit your
disk table to reflect the correct controller.
- To cub, the world has three families of disks: IDE, SCSI,
and RAID. cub assumes that all of the compute nodes have
disks from the same family, but does not require that they
be the same size, or be made by the same vendor. Thus, you
might have a set of compute nodes that use various SCSI
disks from 9 to 80 GB, using various SCSI controllers from
QLogic, NCR, and DEC.
- 4GB is the smallest practical disk for a compute node.
If your application will use or store data locally on the
compute node disks, you will need appropriately larger disks.
|
|
| |
| Networks |
- At the absolute minimum, each node must have one Ethernet
interface. If the management node has but one interface,
you must ensure that no other system on your local network
is using an address in the blocks 10.1.2.* or 10.1.3.*
- It is highly desirable, and the cub documentation recommends,
that the management node have two Ethernet interfaces: eth1
is connected to your site's LAN, and eth0 connects to a
cluster-private network between your compute and management
nodes.
- If your compute nodes all have two Ethernet controllers,
you can configure a second private network between them
that is reserved for MPI message-passing traffic between
them. This second network should not be connected to any
other network, nor should it be connected to the management
node!
- You can also use a Myrinet network for the private MPI
network used by the compute nodes. (Note that there is no
benefit to connecting the Myrinet network to the management
node at this time; you'll just waste an expensive interface
card.)
|
|
| CPUs |
- For x86-based systems, cub assumes a minimum of Pentium
II processor
- For Alpha, cub accepts any Alpha processor
|
|
| KVM (keyboard, video, mouse) |
- The management node must support X11 (almost assuredly
XFree86). Without X, the gui tools installed by cub would
be pointless!
- The compute nodes do NOT need graphics interfaces, keyboards,
or mice, so long as some acceptable surrogate is available:
- For Alpha systems, cub will be happy to use a terminal
server connected to the compute nodes' serial ports
to start and stop the systems.
- On x86 clusters, or Alpha clusters without terminal
servers, you'll probably want to use a KVM switch to
access the compute nodes when they are being configured,
when they are offline, and so on.
|
|
Deciding on a Kernel
By default, the script builds a kernel for the compute nodes from
the same sources you use to build the management node's kernel.
If you want the compute nodes' kernel to be built from different
sources, install the sources under the directory /usr/src
with the name or logical link linux,
so that /usr/src/linux
points to them.
The script looks for the kernel sources in the following order:
- /usr/src/linux
- /usr/src/linux-a.b.c-d
- /usr/src/linux-a.b.c
- /usr/src/linux-a.b
where linux-a.b.c-d
is the kernel version as found in
/proc/version.
Myricom Myrinet Sources
If your cluster will communicate via Myrinet and you want to install
a version of Myricom's GM or MPICH other than the version included
on the cub CD, make sure the appropriate tar balls reside in a currently-mounted
directory that the script will be able to access.
Creating Boot Floppies for Compute Nodes (IA32 Clusters Only)
If your cluster is IA32-based, create a separate boot floppy for
every compute node in the cluster. After installing the cluster
software on the management node, you will use these floppies to
boot the individual compute nodes. (Alpha compute nodes can be booted
over the network so boot floppies are unnecessary.)
To create the floppies, go to the web site http://rom-o-matic.net/,
which can be used to dynamically generate Etherboot ROM images.
- To create a boot floppy on a UNIX or Linux system, as root,
use the following command line:
# dd of=/dev/fd0 if=etherboot.floppy-image
- To create a boot floppy on an MS-DOS-based system, find the
program called rawrite.exe
on your Linux distribution, copy it to your C: drive, mount the
cub CD, run rawrite.exe,
and follow the prompts.
On a RedHat kit, look for d:\dosutils\rawrite.exe
on the first CD.
On a SuSE kit, look for d:\dosutils\rawrite\rawrite.exe
on the first CD.
Collecting MAC Addresses
To configure the cluster, the cub script requires the Ethernet
MAC address of every compute node in your cluster. You must have
these MAC addresses ready to enter before you run the script.
On an IA32-based cluster, follow these steps:
- Power off all of the compute nodes.
- Boot cnode0 with
an etherboot floppy.
- Watch the cnode0
display for a message like
Probing...[EEPRO100] Ethernet
addr: 00:50:8B:72:A5:91
Write down the MAC address, which is the numeric string starting
with 00:50. When you eventually run cub, it will ask you to enter
the MAC address for every compute node in the cluster.
- Power down cnode0.
- Repeat steps 2 through 4 for each compute node.
On an Alpha EV67-based cluster, follow these steps:
- Power up a compute node.
- At the SRM >>>
prompt, enter show device
ewa0.
The MAC address is the string with lots of colons in it; write
it down and save it for later.
- Power down the node.
- Repeat steps 1 through 3 for each compute node.
Installing RedHat
Install Redhat on the system you intend to be the management node
in your Beowulf cluster. As you run this installation, make the
following selections:
- For Install type,
select Custom System.
- Set up the cluster-private network on eth0 and the world
network on eth1, as follows:
- For eth0:
IP address: 10.1.2.1
Network: 10.1.2.0
Netmask: 255.255.255.0
Nodename: mgtnode
- For eth1, set it up as appropriate for your local network.
You should also set the gateway and name servers as appropriate
for your local network.
Note: You must set your local system name as mgtnode
on this system, even though it probably has been assigned
a different name on your local (eth1) network.
- Firewall setting: If you select something other than No
firewall, you must also choose:
- customize
- trusted devices:
eth0
- If you will be accessing your cluster from elsewhere on the
net, you will probably want to select
- allow incoming:
SSH
- allow incoming:
WWW(HTTP)
- For Package Group Selection,
choose at least
- X Window System
- KDE
- Mail
- Networked Workstation
- NFS Server
- Web server
- Emacs
- Development
- Kernel development
- Utilities
- Check the select individual
packages box, then add
- Applications/Internet/
tcpdump
- Applications/Internet/
tftp
- System environment/Daemons
dhcp
- System environment/Daemons
tftp-server
- Build and install the system.
You're now ready to run the cub script.
Running the cub Installation Script
Mount the cub CD and execute the installation script:
# mount /mnt/cdrom
# /mnt/cdrom/cub
- When prompted to do so, enter the compute node MAC addresses
you have collected. Note that each of the colon-separated octets
must be specified with 2 digits. Thus, the MAC address a:0:ca:49:3:5:d3:cc
must be entered as 0a:00:ca:49:03:05:d3:cc. Also note that the
MAC address response is not case-sensitive.
- When prompted, edit the disktable file, if necessary.
- When prompted, specify whether to install the Myrinet software.
- If yes, specify a directory mounted on the local system
that cub can access to find the gm and mpich tar balls. The
default is the current cub CD.
The installation procedure will now proceed unattended for 45 to
60 minutes, depending on the speed of your system. See the Sample
cub Script Dialogue in Figure 1.
Booting the Compute Nodes
After the cub script has completed, the client nodes are ready
to be booted:
- For an IA32-based cluster, use the etherboot
floppies you created earlier to boot each of the compute nodes.
- For an Alpha-based cluster, at the SRM prompt for each compute
node, type
boot ewa0 -fl "root=/dev/nfs
ip=bootp"
When each of the compute nodes displays the message Installation
complete - reboot or press [ENTER] to log in, power down
the node, remove the floppy from an IA32 node, and restart the system.
Your cluster installation is now complete!
Running the mpich-start Script
The cub script itself creates another script on the server node
called mpich-start
to start up the mpich environment. The environment set up depends
on
- Whether the cluster uses Myrinet and
- Which nodes are running at the time you run mpich-start.
To run the script, enter the following command on the server node:
/tftpboot/mpich-start
The command takes no arguments.
You must run this script after you've configured and installed
the server and compute nodes and after you've rebooted the
compute nodes to run the newly installed systems. You must also
run it any time you drop nodes from or add nodes to the cluster.
When you run mpich-start,
it creates the mpich conf
file, which includes a list of the available nodes and the
number of CPUs on each node. On a Myrinet cluster, mpich-start
also initializes the Myrinet devices and runs the mapper.
Sample cub Script Dialogue
In Figure 1, the sample dialogue installs an IA32 cluster made
up of a management node and three compute nodes, which use Myrinet
for passing high-speed MPICH or PVM application messages.
Figure 1 Sample cub Script Dialogue
 |
[root@mgtnode /root]#
/mnt/cdrom/cub
cub 1.3
Copyright (C) 2002 by Compaq Computer Corporation
cub comes with ABSOLUTELY
NO WARRANTY. This is
free software, and you are welcome to redistribute it
under certain conditions; see the file COPYING in the
cub kit.
Check /tmp/cub-log
for errors if this procedure fails.
Found 2 Ethernet
devices: eth0 eth1
The cnode kernel will be built from the 2.4.16-1 sources in
/usr/src/linux
What additional network
will you use for passing high-speed
application messages (MPICH or PVM) between compute nodes?
1) None
2) Myrinet
3) Second Ethernet network
Enter 1, 2, or 3:
[1]
...Setting up the
private cluster network...
The time has come
to enter MAC addresses
You will have a chance
to edit the MAC addresses later
if you make a mistake entering them here.
Enter a MAC address
or <ENTER> to end: 00:50:8b:72:a5:91
Enter a MAC address or <ENTER> to end: 00:50:8b:35:b6:c8
Enter a MAC address or <ENTER> to end: 00:50:8b:b3:82:21
Enter a MAC address or <ENTER> to end:
Would you like to edit the MAC address list? [no]
...3 compute nodes
defined...
...Building the
disk table...
Your new disktable:
/dev/sda1 ext2 30
m n /boot
/dev/sda2 swap 1000 m
/dev/sda3 ext2 * m n /
Would you like to
edit this file? [no]
Where is the MPICH
source tar file? [/mnt/cdrom/mpich-nogm-1.2.2.2.tar.gz]
+++ The next steps
should run for 45-60 minutes
+++ before we need your help again. You can watch
+++ the "cub log" window to confirm that
+++ something is happening.
...installing bootpd...
...configuring SCMS...
...modifying system
control files...
...Testing rsh...
...Building LUI...
...Setting up NFS...
...Installing mknbi...
...Starting kernel configuration for NFS booting...
...Starting build of kernel 2.4.16-nfs...
...Starting kernel configuration for the compute nodes...
...Starting build of kernel 2.4.16-cnode...
...Building ram disk...
...Copying files to 12MB ram disk...
...Skipping GM...
...Building MPICH...
...Building Perl-Tk...
...Now setting up LUI...
...Final preparation and setup...
...cub finished successfully...
[root@mgtnode /root]#
|
|