Jeff
02-07-2006, 05:18 PM
If you have a VDS and are not familiar with basic Linux systems administration, this tutorial is for you. It is very important to understand that Linux systems administration really does start here. The panel is secondary to the command line, and is only of marginal use for troubleshooting compared to knowing your way around the shell. If you are not familiar with shell access, don't wait until you have a problem to do the steps listed in this thread. Do them now so that you are comfortable and know how to troubleshoot should a problem occur. You should review this tutorial at least once a day for 1 week. If you do that, and ask questions in this thread, you will be on your way to successfully managing your VDS in terms of troubleshooting issues that occur, as well as handling preventive maintenance.
First, we cover how to get shell access:
How to use PuTTY
One of the most popular utilities for remotely connecting to ssh servers is PuTTY (http://www.chiark.greenend.org.uk/~sgtatham/putty/). PuTTY is a very small, lightweight remote access application. After downloading and running PuTTY, simply enter the hostname or IP address you'd like to connect to, and hit "Open". You will be prompted for your username and password, so be sure to have these ready. Keep in mind that it's generally not a good idea to log in as root. It is recommended that you log in with your unprivileged username, then "su" to root once logged in.
Second, we cover a few basic troubleshooting commands.
- w
Once you are logged into your VDS, simply type "w" (without the quotes). Familiarize yourself with every aspect of the output, but pay close attention to the load average:
# w
14:09:43 up 18 days, 1:58, 4 users, load average: 0.01, 0.00, 0.00
Your load average is a metric that is based on resource usage, such as CPU usage, MEMORY usage, as well as disk I/O (input/output). The numbers from left to right are the 1, 5, and 15 minute load averages. Only through experience with this command will you be able to recognize when your load average is low and acceptable, and when it is high and unacceptable. When it is high and unacceptable, it will be up to you to find the cause, which brings us to our next command, "ps".
- ps aux
Now, type "ps aux" in the shell. The output that just went flying across your screen is a list of all the processes running on your VDS, along with some other extremely useful information. Let's talk about what each column of output means. The very first column, before any processes are displayed, is this:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
Note: to view your process list 1 screen at a time, pipe the output to "less". For example:
# ps aux | less
The "|" character is the pipe, which allows you to redirect output of one command (ps aux) to another (less). The "less" command allows you to view output 1 screen at a time. You only need to know 2 things to work with "less". They are:
1. Hitting the spacebar will allow the next screen to be displayed, and
2. Hitting "q" will exit from the output
Note that the terms "process" and "command" will be used interchangeably below. They refer to the same thing.
USER
- This is the username that the process is being run as
PID
- This is the process ID
%CPU
- This is the percentage of CPU that is being used by the process
%MEM
- This is the percentage of memory that is being used by the process
VSZ
- This is the amount of virtual memory being used by the process
RSS
- This is the amount of physical memory being used by the process
TTY
- This is the terminal used by the process, if any
STAT
- This is the state that the process is in
START
- This is the time that the process started running
TIME
- This is the amount of time the process has been running for
COMMAND
- This is the name of the command that is running
More information on each of these fields can be found in the ps man page.
- Q. What is a man page?
- A. A man page is an instructional guide for a topic (typically a command) that you want to learn more about. For example, type: man ps
Note: You can work with a man page in the same way you would work with "less" - hitting the spacebar will take you to the next screen of output, and "q" will exit.
The next command we are going to talk about is the "service" command. First, type this: ls -al /etc/rc.d/init.d
All the files you see in the output are startup scripts, which run at boot time. A few important services to note in the output are:
exim
cpanel
httpd
mysql
named
Let's say you are having a problem reaching your websites, but you can ping your VDS, so you know it's running. Log into your VDS, su to root, and run the following command:
/sbin/service httpd status
Is your webserver running? If not, try to start it:
/sbin/service httpd start
If your webserver is running, but you still cannot connect to it, it may be a DNS issue with your VDS. Is named running? Type this:
/sbin/service named status
If named is not running, and you run your own nameservers from your VDS in order to handle DNS requests for your websites, no one will be able to reach your websites, and you must start named:
/sbin/service named start
The /sbin/service servicename status command can be used for most services in /etc/rc.d/init.d
Let's say you are experiencing a high load average on your VDS, and you notice many Exim processes in the ps aux output. You will need to take into consideration how many emails you are sending per minute, and to how many users overall. For each email that is sent, an Exim process is spawned (run). Each of these processes takes resources, such as CPU, MEM, disk I/O (input/output), etc. You may need to look into throttling the mail list to send less emails on a per minute basis.
Let's say you are experiencing issues with failing services. The most common cause for this is that you are exceeding your allowable resource threshhold (typically memory). If you have 10,000+ people on your mailing list, it is important to understand that the resources you use for running 10,000 processes as fast as possible will be taken away from your existing processes. Your VDS will happily run as many processes as it can up to the allowable limit, and will happily use as much memory as possible up to the allowable limit, if you push your resources to their maxmimum allowable limit. Once you spawned as many processes as allowed for your VDS, you cannot simply expect to be able to spawn more. 1 process must die before another can run. If you have 1,000 processes trying to run after you've hit your limit, you can't expect to still be able to send tons of email, or serve numerous web requests all at once in a timely fashion. Remember, CPU and memory are finite resources. The more you use, the more you take away from the amount allotted to your VDS. This is not just the case with VDS servers, this is the case with a computer of any type, be it a dedicated server, a shared webhosting server, a Windows server, a file server, and so on. It is very easy to consume all your resources very quickly if you're not familiar with basic systems administration.
Run the commands in this tutorial several times a day for a week, study the output, and you will be very familiar with how your VDS behaves, and how it reacts to different situations (ie: during peak web activity, during mailing list sends, and so on).
This is a very basic guide to very basic Linux systems administration. The goal is to get you acquainted with the command line, and to learn a few very basic yet necessary commands to familiarize yourself with a Linux environment.
To recap, the commands covered here are:
1. w
The important thing to note here is your current load average. The load average is a metric of your resource usage, to include CPU, memory, and disk activity. The first number on the left (of the three total numbers) is your 1 minute load average
2. ps aux (or ps aux | less)
Examine and become familiar with the processes your VDS is running. Note how much CPU and memory they are using. Are there 155 MySQL processes running? Are there 200 Exim processes running? If so, and you are experiencing high loads, slow web page load times, failing services, or any other type of undesired behavior on your VDS, this could very well be the cause, and you will need to take corrective action.
3. man <command>
Typing "man" followed by a command name will display information about a specific command, if a man page is available for that command. man pages are very detailed, typically covering every possible option for a command with a detailed explanation for each option. As an exercise, type: man ps and learn what the a, u, and x options are for.
4. service <servicename> status
The service command can be used to check the status of a service located in /etc/rc.d/init.d, such as httpd, MySQL, named, exim, cpanel, and so on. To start a service, type:
/sbin/service <servicename> start
As always, please feel free to post any questions in this thread. Please don't be afraid to ask, we will be more than happy to answer any questions you may have regarding Linux systems administration. Please understand that the commands covered in this thread will help you to troubleshoot a great deal of issues we have heard about over a long period of time, which usually comes down to one very simple thing: using an excessive amount of resources, and how to troubleshoot this issue.
First, we cover how to get shell access:
How to use PuTTY
One of the most popular utilities for remotely connecting to ssh servers is PuTTY (http://www.chiark.greenend.org.uk/~sgtatham/putty/). PuTTY is a very small, lightweight remote access application. After downloading and running PuTTY, simply enter the hostname or IP address you'd like to connect to, and hit "Open". You will be prompted for your username and password, so be sure to have these ready. Keep in mind that it's generally not a good idea to log in as root. It is recommended that you log in with your unprivileged username, then "su" to root once logged in.
Second, we cover a few basic troubleshooting commands.
- w
Once you are logged into your VDS, simply type "w" (without the quotes). Familiarize yourself with every aspect of the output, but pay close attention to the load average:
# w
14:09:43 up 18 days, 1:58, 4 users, load average: 0.01, 0.00, 0.00
Your load average is a metric that is based on resource usage, such as CPU usage, MEMORY usage, as well as disk I/O (input/output). The numbers from left to right are the 1, 5, and 15 minute load averages. Only through experience with this command will you be able to recognize when your load average is low and acceptable, and when it is high and unacceptable. When it is high and unacceptable, it will be up to you to find the cause, which brings us to our next command, "ps".
- ps aux
Now, type "ps aux" in the shell. The output that just went flying across your screen is a list of all the processes running on your VDS, along with some other extremely useful information. Let's talk about what each column of output means. The very first column, before any processes are displayed, is this:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
Note: to view your process list 1 screen at a time, pipe the output to "less". For example:
# ps aux | less
The "|" character is the pipe, which allows you to redirect output of one command (ps aux) to another (less). The "less" command allows you to view output 1 screen at a time. You only need to know 2 things to work with "less". They are:
1. Hitting the spacebar will allow the next screen to be displayed, and
2. Hitting "q" will exit from the output
Note that the terms "process" and "command" will be used interchangeably below. They refer to the same thing.
USER
- This is the username that the process is being run as
PID
- This is the process ID
%CPU
- This is the percentage of CPU that is being used by the process
%MEM
- This is the percentage of memory that is being used by the process
VSZ
- This is the amount of virtual memory being used by the process
RSS
- This is the amount of physical memory being used by the process
TTY
- This is the terminal used by the process, if any
STAT
- This is the state that the process is in
START
- This is the time that the process started running
TIME
- This is the amount of time the process has been running for
COMMAND
- This is the name of the command that is running
More information on each of these fields can be found in the ps man page.
- Q. What is a man page?
- A. A man page is an instructional guide for a topic (typically a command) that you want to learn more about. For example, type: man ps
Note: You can work with a man page in the same way you would work with "less" - hitting the spacebar will take you to the next screen of output, and "q" will exit.
The next command we are going to talk about is the "service" command. First, type this: ls -al /etc/rc.d/init.d
All the files you see in the output are startup scripts, which run at boot time. A few important services to note in the output are:
exim
cpanel
httpd
mysql
named
Let's say you are having a problem reaching your websites, but you can ping your VDS, so you know it's running. Log into your VDS, su to root, and run the following command:
/sbin/service httpd status
Is your webserver running? If not, try to start it:
/sbin/service httpd start
If your webserver is running, but you still cannot connect to it, it may be a DNS issue with your VDS. Is named running? Type this:
/sbin/service named status
If named is not running, and you run your own nameservers from your VDS in order to handle DNS requests for your websites, no one will be able to reach your websites, and you must start named:
/sbin/service named start
The /sbin/service servicename status command can be used for most services in /etc/rc.d/init.d
Let's say you are experiencing a high load average on your VDS, and you notice many Exim processes in the ps aux output. You will need to take into consideration how many emails you are sending per minute, and to how many users overall. For each email that is sent, an Exim process is spawned (run). Each of these processes takes resources, such as CPU, MEM, disk I/O (input/output), etc. You may need to look into throttling the mail list to send less emails on a per minute basis.
Let's say you are experiencing issues with failing services. The most common cause for this is that you are exceeding your allowable resource threshhold (typically memory). If you have 10,000+ people on your mailing list, it is important to understand that the resources you use for running 10,000 processes as fast as possible will be taken away from your existing processes. Your VDS will happily run as many processes as it can up to the allowable limit, and will happily use as much memory as possible up to the allowable limit, if you push your resources to their maxmimum allowable limit. Once you spawned as many processes as allowed for your VDS, you cannot simply expect to be able to spawn more. 1 process must die before another can run. If you have 1,000 processes trying to run after you've hit your limit, you can't expect to still be able to send tons of email, or serve numerous web requests all at once in a timely fashion. Remember, CPU and memory are finite resources. The more you use, the more you take away from the amount allotted to your VDS. This is not just the case with VDS servers, this is the case with a computer of any type, be it a dedicated server, a shared webhosting server, a Windows server, a file server, and so on. It is very easy to consume all your resources very quickly if you're not familiar with basic systems administration.
Run the commands in this tutorial several times a day for a week, study the output, and you will be very familiar with how your VDS behaves, and how it reacts to different situations (ie: during peak web activity, during mailing list sends, and so on).
This is a very basic guide to very basic Linux systems administration. The goal is to get you acquainted with the command line, and to learn a few very basic yet necessary commands to familiarize yourself with a Linux environment.
To recap, the commands covered here are:
1. w
The important thing to note here is your current load average. The load average is a metric of your resource usage, to include CPU, memory, and disk activity. The first number on the left (of the three total numbers) is your 1 minute load average
2. ps aux (or ps aux | less)
Examine and become familiar with the processes your VDS is running. Note how much CPU and memory they are using. Are there 155 MySQL processes running? Are there 200 Exim processes running? If so, and you are experiencing high loads, slow web page load times, failing services, or any other type of undesired behavior on your VDS, this could very well be the cause, and you will need to take corrective action.
3. man <command>
Typing "man" followed by a command name will display information about a specific command, if a man page is available for that command. man pages are very detailed, typically covering every possible option for a command with a detailed explanation for each option. As an exercise, type: man ps and learn what the a, u, and x options are for.
4. service <servicename> status
The service command can be used to check the status of a service located in /etc/rc.d/init.d, such as httpd, MySQL, named, exim, cpanel, and so on. To start a service, type:
/sbin/service <servicename> start
As always, please feel free to post any questions in this thread. Please don't be afraid to ask, we will be more than happy to answer any questions you may have regarding Linux systems administration. Please understand that the commands covered in this thread will help you to troubleshoot a great deal of issues we have heard about over a long period of time, which usually comes down to one very simple thing: using an excessive amount of resources, and how to troubleshoot this issue.