CIS 24: CGI and Perl Programming for the Web

Class 7 (10/23) Lecture Notes

Topics

  1. HTTP Transactions
  2. How to Decode Form Data
  3. Receiving, Decoding, and Using Form Data in a CGI Script
  4. Form Input Validation
  5. Using and Debugging CGI Applications with the Apache Web Server
  6. Installing and Configuring the Apache Web Server at Home
  7. Lab: Testing Apache & Work for Final Projects

Return to CIS 24 home page


  1. HTTP Transactions
    1. Having a basic understanding of the Hypertext Transfer Protocol (HTTP) is a key element in learning how to create CGI applications. HTTP is the language of communication between web servers and web browsers. When you visit a web page, you initiate an HTTP request, and the web server then responds by sending a copy of the file you requested (typically a web page for display in your browser). In using CGI applications to dynamically create web pages based on user inputs (e.g. a web page that shows the status of a person's product order), you need to know how the user's data is sent to your CGI script so that you can decode it and use it, and you need to know how to structure your dynamically generated page so that it can be sent via HTTP.

    2. The specifics of HTTP: client requests (you going to a web site) and web server responses (what you see at the web site) are HTTP transactions that both have the same general format. First is the request line or response line, second is the header section, and third is the body of the request or response. It's important to note that as a web surfer, the only portion of all this that you completely see is the body of the server's response, which is generally the web page you requested. The rest happens behind the scenes.

    3. Let's begin with the client request:

      1. First your browser locates the web server at the specified URL (specified by the hyperlink you clicked, the URL you typed into your browser window, or the form you submitted). After that it sends textual data to the web server which consists of the following:

      2. The request line: the first pieces of information sent to the server are some information on what you want to see and how you want to communicate. A typical request line might look like this:
        GET /cis24/class7_notes.html HTTP/1.1

        The first part of the line is the request method. When you want to view an HTML document, your browser automatically performs a GET. The other method that's generally supported is POST, which is often used to perform form submissions. We'll talk more about GET vs. POST as we go on.

        The second part of the line is the path of the resource you want. In the above example, it is a request for an HTML file in the cis24 directory. If the request was a form submission using the GET method, all of the form data would be appended to the URL as a query string that is URL encoded. For example, if I submitted a form to a script located at /cgi-bin/myscript.pl and I provided inputs to fields that asked for my first name, last name, and birthday, the request line would look like this:

        GET /cgi-bin/myscript.pl?firstname=Mike&lastname=Toppa&birthday=6%2F21%2F70 HTTP/1.1

        The query string is all of the data following the question mark. URL encoding is how the data is formatted: an equal sign separates the input field's name from the value I entered for it, and the ampersand separates the name/value pairs. Also, special characters that might be misinterpreted by the web server (such as a slash) are converted to their hexadecimal equivalents (for the slash, this is %2F).

        Although the POST method does not use a query string to send form data, it still URL encodes it. So, regardless of the method used for the form submission, your script will have to decode the form data before you can use it. More on this in a few minutes.

        The third part of the request line indicates that you are using the HTTP 1.1 standard to make the request.

      3. The header: the header starts on the line immediately following the request line. The header informs the web server of the browser's configuration and the document formats it will accept. Here are a couple of sample lines:
        User-Agent: Mozilla/4.05(WinNT; I)
        Accept: image/gif, image/jpeg

        After your browser has sent all its header fields, it will send a blank line (i.e. two linefeeds), to indicate the end of the header.

      4. The body: with a GET request, the body is empty. If the request consists of a form submission made with the POST method, the form data is sent in the body. As with GET, the data is URL encoded. However, instead of parsing the data from a query string, your Perl script will read the data from STDIN. When you ran your scripts in a MS-DOS window, STDIN was anything you typed in response to a prompt from your script. In a CGI environment, STDIN is the body of a client request.

    4. The server's response to the request has the same basic format:

      1. The response line: like the request line, it also contains three fields:
        HTTP/1.1 200 OK

        The first indicates that HTTP 1.1 is the protocol being used to communicate. The second is a code indicating the status of the request. "200" means that the request was successful and that the requested data will be sent. The third field is a brief description of the meaning of the status code. In this case, "OK".

      2. The header: the server provides information about itself and the requested document. For example:
        Date: Thu, 23 Mar 2000 08:20:33 GMT
        Server: NCSA/1.5.2
        Last-modified: Tue, 21 Mar 2000 12:15:22 GMT
        Content-type: text/html
        Content-length: 2482

        A blank line ends the header.

        The body: assuming the request is successful, the requested data is sent. The data may be a copy of an HTML file or a response from a CGI program.

      3. Previously I introduced you to environment variables, which are variables associated with the environment in which your application is running. Environment variables are available to your scripts in the special associative array %ENV. With the MS-DOS windows we've been using to run our scripts, %ENV contained information on how the MS-DOS window was configured, the computer's hardware, etc. When you run a Perl script as a CGI application, %ENV contains the environment variables for your web server. These environment variables are a subset of the request and response headers we just discussed. We'll rely on some of these environment variables to help us process form submissions (more below).

  2. How to Decode Form Data
  3. In the above example of a GET form submission, the value entered for "birthday" was 6/21/70. The / character can be problematic in a URL string. It's normally used to separate directory names, but here it does not have the purpose. To avoid the possibility of web servers and web browsers getting confused, form data is URL encoded. Characters such as / ? # & @ and others are converted to codes in hexadecimal format. As we saw above, the hexadecimal code for a / is 2F. The % character is used to indicate that a hexadecimal code is coming, so the complete representation of a / is %2F. For all the details on URL encoding, see this web page.

    One of the first jobs of a script that processes form data is to undo the URL encoding. Fortunately, Perl has some built in functions that make it fairly easy for you to do this. The first one to look at is called hex. Hex accepts one argument, which is a number in hexadecimal notation. Hex takes this number and converts it to a standard decimal format. For example, a number sign (#) is represented in hexadecimal notation as 23. If you pass 23 to hex, it will convert it to 35, which is the decimal notation for #.

    So now you're halfway done. To turn the number 35 to a #, you use the pack function. pack accepts two arguments. Using "C" as the first argument tells pack to convert the second argument (which is your decimal number) to its character equivalent. So, pack will take the number 35 and turn it into a #. Here's some example code:

    $hexNum = 23;
    print "the hex value is: $hexNum\n";
    $decNum = hex($hexNum);
    print "the equivalent decimal value is: $decNum\n";
    $char = pack("C", $decNum);
    print "the equivalent character is: $char\n";

  4. Receiving, Decoding, and Using Form Data in a CGI Script
    1. A CGI application is typically used to process form submissions, so we'll start with a simple HTML form:
      <HTML>
      <HEAD>
      <TITLE>A simple form</TITLE>
      </HEAD>
      
      <BODY>
      
      <FORM ACTION="/cgi-bin/birthday.pl" METHOD="GET">
      <P>First Name:
      <BR><INPUT TYPE="Text" NAME="firstname" SIZE="20">
      
      <P>Last Name:
      <BR><INPUT TYPE="Text" NAME="lastname" SIZE="20">
      
      <P>Birthday:
      <BR><INPUT TYPE="Text" NAME="birthday" SIZE="10">
      
      <P>Email Address:
      <BR><INPUT TYPE="Text" NAME="email" SIZE="20">
      
      <P><INPUT TYPE="Submit" NAME="submit" VALUE="Send Information">
      </FORM>
      
      </BODY>
      </HTML>
      

    2. Below a script that receives the form data that the user submits, translates the URL encoding, and prints the user's inputs back to the screen in a dynamically created web page. Here's the script:
      #!perl
      
      # The shebang line has to be the first line in your script!
      # More on this later in tonight's class.
      
      # If the data was sent via GET, we'll decode it from the query string.
      # If it was sent via POST, we'll read it from the body of the request,
      # which is considered STDIN by your web server.
      
      if ($ENV{'REQUEST_METHOD'} eq "GET") {
      	
      	# Split the name-value pairs
      	
      	@pairs = split(/&/, $ENV{'QUERY_STRING'});
      }
      
      elsif ($ENV{'REQUEST_METHOD'} eq "POST") {
      
      	# The "read" function is used to read data into a scalar variable.
      	# The first argument is the file to read from.
      	# The second argument is the variable to assign the results to.
      	# The third argument is the number of bytes to read from the file,
      	# (the content_length tells us how many bytes were sent)
      	
      	read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
      	
      	# Now split the name-value pairs
      	@pairs = split(/&/, $buffer);
      }
      
      # @pairs contains all the name value pairs that were submitted, but they're
      # still joined as single pieces of information, and they're still URL
      # encoded. That is, the array looks something like this:
      
      # (firstname=Joe, lastname=Smith, birthday=1%2F1%2F85)
      
      # We need to get this data into a usable form by getting the key/value
      # pairs into an associative array, and we need to decode the
      # special characters.
      
      foreach $pair (@pairs) {
      	# For each element of the @pairs array, we'll split it into two
      	# variables: $name and $value
      	
      	($name, $value) = split(/=/, $pair);
      	
      	# The next four lines decode the data. First we convert +
      	# characters into spaces. Spaces are the one type of character
      	# which are not put in a hexadecimal format. For all other
      	# special characters, we do the hexadecimal conversion as
      	# described above.
      	
      	$name =~ tr/+/ /;
      	$name =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
      	$value =~ tr/+/ /;
      	$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
      	
      	# Now we'll add each name/value pair to the associative array %in.
      	
      	$in{$name} = $value;
      }
      
      # Now that we have our data in a useful form, we can dynamically create a
      # web page and send it back to the user. The page will display the information
      # he or she entered into the form.
      
      # First we have to create a line for the header that tells the user's browser
      # what kind of data we are sending, so that the browser will know what to do
      # with it. In this case, it's an HTML page. You'll notice that this is an
      # ordinary "print" function. For the web server, standard output (STDOUT) is
      # the header and body of an HTTP response. We indicate that we're done with
      # the header by printing a blank line (two linefeeds)
      
      print "Content-type: text/html\n";
      print "\n";
      
      # Now we can print our HTML document. You can take advantage of Perl's
      # variable interpolation to print variable values directly in your HTML.
      
      print qq^
      <HTML>
      <HEAD>
      <TITLE>Your name and birthday</TITLE>
      </HEAD>
      
      <BODY>
      <P><B>Your first name:</B> $in{'firstname'}
      
      <P><B>Your last name:</B> $in{'lastname'}
      
      <P><B>Your birthday:</B> $in{'birthday'}
      
      <P><B>Your email address:</B> $in{'email'}
      
      </BODY>
      </HTML>
      ^
      

  5. Form Input Validation
  6. We can take advantage of the pattern matching techniques we've learned to add input validation to our form processing (i.e. we can make sure the user put valid information into each of the form fields). The script below does this, and also includes some other enhancements which are noted along the way.
    #!perl
    
    # Parse the incoming form data
    
    if ($ENV{'REQUEST_METHOD'} eq "GET") {
            @pairs = split(/&/, $ENV{'QUERY_STRING'});
    }
    
    elsif ($ENV{'REQUEST_METHOD'} eq "POST") {
            read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
            @pairs = split(/&/, $buffer);
    }
    
    # Add a check to make sure they didn't try to use some other
    # method.
    
    else {
            print "Content-type: text/html\n\n";
            print <<"end_tag";
            <HTML>
            <HEAD>
            <TITLE>Form Error</TITLE>
            </HEAD>
            <BODY>
            <P><B>The METHOD of your request for this script must be either GET or POST</B>
            </BODY>
            </HTML>
    end_tag
            exit;
    }
    
    foreach $pair (@pairs) {
            ($name, $value) = split(/=/, $pair);
            $name =~ tr/+/ /;
            $name =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
            $value =~ tr/+/ /;
            $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
    
    		# We should handle the possibility of the form containing INPUT
    		# elements with the same name. With the code below, we'll
    		# create a comma-seperated list of values for any form fields
    		# that have the same name.
    		
            if ($in{$name}) {
                    $in{$name} = $in{$name} . "," . $value;
            }
            
            else {
                    $in{$name} = $value;
            }
    }
    
    # Check to see if the inputs are valid
    
    push (@errors, "your first name") unless ($in{'firstname'} =~ /\w+/);
    push (@errors, "your last name") unless ($in{'lastname'} =~ /\w+/);
    push (@errors, "a valid date for your birthday") unless ($in{'birthday'} =~ m:^\d{1,2}/\d{1,2}/\d{1,2}$:);
    push (@errors, "a valid email address") unless ($in{'email'} =~ /^\S+\@\S+\.\S+$/);
    
    # If there's any invalid data, print a page with the error messages, and exit the script.
    
    if (@errors) {
            print "Content-type: text/html\n\n";
            print <<"end_tag";
            <HTML>
            <HEAD>
            <TITLE>Input Error</TITLE>
            </HEAD>
            <BODY>
    		
    		<P>Please click your browser's <I>back</I> button and enter:
    		<UL>
    end_tag
    
    		foreach $error(@errors) {
    			print "<LI>$error\n";
    		}
    
            print <<"end_tag";
    		</UL>
            </BODY>
            </HTML>
    end_tag
    	exit;
    }
    
    print qq^Content-type: text/html
    
    <HTML>
    <HEAD>
    <TITLE>Your name and birthday</TITLE>
    </HEAD>
    
    <BODY>
    <P><B>Your first name:</B> $in{'firstname'}
    
    <P><B>Your last name:</B> $in{'lastname'}
    
    <P><B>Your birthday:</B> $in{'birthday'}
    
    <P><B>Your email address:</B> $in{'email'}
    
    </BODY>
    </HTML>
    ^
    

  7. Using and Debugging CGI Applications with the Apache Web Server
  8. Apache is a free web server, and is the one in most common use across the web. It's popular because it's free, and because it is well made. However, it has been designed to work in a Unix environment, so the Windows version we'll be installing is not guaranteed to be as stable as the Unix versions. You can learn more about the Apache web server and the Apache project at http://www.apache.org

    Apache already should be configured properly for our use on the Lab computers. After the lecture we'll test them to make sure.

    1. The Basics

      An HTML form is typically the "front-end" of a CGI application. These forms, and other HTML documents should be saved in the "htdocs" directory under the "Apache" directory.

      Your Perl/CGI scripts need to be placed in a special directory named "cgi-bin". The Apache servers in the Lab are configured so that scripts will work only if they are stored in this location.

      Your HTML form must have the correct location of the Perl/CGI script indicated in the ACTION attribute of the FORM tag - see the "birthday" form above for an example.

    2. Debugging Your CGI Applications

      So far you've been able to debug your scripts simply by running them in an MS-DOS window. If there was an error in the script, you'd see a detailed error message printed to your screen when you tried to run the script. As our scripts become longer and more complex, you should get out of this habit. For example, you may have a script that opens a text file and makes changes to it. If the script contains an error that causes it to crash partway through executing, it could conceivably ruin your text file. Also, when you use Perl for CGI, you won't see detailed error messages in your browser when there's a problem: all you'll get is a page that says "Server Error." So, what are the options for debugging a CGI Perl script?

      1. A good way to run a quick check on your script is to run it from the command line (i.e. in an MS-DOS window) with the following syntax:
        perl -c your_script.pl

        The "-c" switch tells Perl to read through the script and look for syntax errors, but to not actually run the script. It will report back with the message "OK" if it doesn't find any problems, or it will list the errors it finds.

      2. There's also the Perl debugger. This is an interactive command-line application that comes with Perl. It steps through the execution of your script, and as you go allows you to set "breakpoints", print out the values of variables, and many other things. We won't have time to get into using the debugger in this course, but it's explained in Chapter 12 of Teach Your Perl if you want to read up on it.

      3. When you run a Perl script as a CGI application, you just get a terse "Server Error" page if your script fails to execute. However, more detailed information is saved to Apache's error log file. You can access the error log in the "logs" directory under the "Apache" directory (other web servers also keep an error log file, unless the web server administrator has turned off error logging). All of the lines in the error log are time stamped, so before you start looking through the error log for the messages that are relevant to your script, make a note of the time when the error occurred.

      4. Other things to note when debugging a Perl CGI script:
        • When viewing the form that you want to submit to your CGI script, make sure you're viewing it through your web browser. If your browser's "Address" window does not start with "http://" then you're not viewing the page through your browser, which means it can't access your CGI script.
        • Apache relies on your script's "shebang" line to indicate the correct location of Perl on the computer. Your script won't run if your shebang line is incorrect.
        • Make sure your script is saved to a directory that allows execution of CGI scripts. On most servers this is the "cgi-bin" directory. If you submit a form and then your browser prompts you to "download application/x-perl" - that means your script is not executable, and your browser is by default trying to download it instead.
        • Make sure your form's FORM tag has the correct path to your script for its ACTION attribute.
        • If your web server is on a Unix computer, make sure the script has global read and execute permissions, so that the web server can run it.

  9. Installing and Configuring the Apache Web Server at Home
    1. You can download a version of Apache designed for Windows here - the current version to download is apache_1_3_14_win32.exe. You can also install Apache directly from the CD-ROM that came with your book. During the installation, you can go with all of the default settings (but you can feel free to change the installation directory if you wish).

    2. When you start your Apache server, it reads three configuration files which control how it operates. In the most recent versions of Apache, only one of these configuration files is used, and the other two are blank (although they are still read by Apache, if you want to use them for any reason). The one that's used is called httpd.conf and it's located in the "conf" directory under your "Apache" directory.

    3. The following modification to httpd.conf is necessary to start Apache if you're using it at home - do not make this change to the computers here in the lab! Without this update, Apache will fail to start up if your computer does not already have a network address (i.e. it will fail to find a hostname to bind to, so you need to assign one):

      • Using Notepad, open the httpd.conf file on your computer (note that you may have installed Apache to a different directory than what you see on the Lab computers).
      • Under Notepad's search menu, do a "find" for:
        #ServerName new.host.name
      • After Notepad finds that line for you, type in the following new line immediately after it:
        ServerName localhost
      • Start Apache
      • Open your browser and enter the address:
        http://localhost

    4. With the default configuration, CGI scripts will only work if you put them in the "cgi-bin" directory. You can change the directory that's used for this, or specify additional directories by altering the ScriptAlias directive in the httpd.conf file:
      ScriptAlias /cgi-bin/ "f:/Apache/cgi-bin/"

      You could add an additional line:

      ScriptAlias /scripts/ "f:/Apache/scripts/"

    5. If you want to allow scripts to be executed from any directory under the web server's document root, remove the comment (#) from the following line in httpd.conf:
      #AddHandler cgi-script .cgi

      This will allow any scripts with the file extension .cgi to be executed. You can change this or specify additional file extensions if you wish.

    6. There's a lot more you can do with the Apache server configuration that are beyond the scope of this class (e.g. password protect directories and serving multiple domains from a single server). Check out http://httpd.apache.org to learn more.

    7. CGI is not the most efficient method for implementing interactive web sites, as a separate process on your computer must be started whenever a Perl script is executed at the request of the web server. ActivePerl (which is what we're using) also comes with a version of Perl called Perl for ISAPI (Internet Server Application Programming Interface). Perl for ISAPI uses Windows dynamic-link library function calls to communicate with the web server, which eliminates the need to start a new process to execute a Perl script. All that basically means is that your CGI scripts will be faster with Perl for ISAPI than with regular Perl. Unfortunately, Apache for Windows does not support Perl for ISAPI.

    8. If you need or want to use a web server other than Apache, and you want to do CGI with Perl, you can do so. The ActiveState web site has excellent documentation on configuring other Windows web servers for using Perl with CGI: http://www.activestate.com/ActivePerl/docs/Perl-Win32/perlwin32faq6.html.

  10. Lab: Testing Apache & Work for Final Projects
    1. Testing the Apache installation

      1. You should be able to find the "Apache Group" under "Programs" on your "Start" menu. Click on "Start Apache" - this will open an MS-DOS window that says Apache is running. You can minimize this window - but do not close it unless you want to shut down the web server!

      2. Now let's test to see if the server is running properly. Open a new browser window and go to the address http://127.0.0.1 (this is a special "loopback" address that you can use on any computer to refer to it's own web server. You should see the default home page that comes with the Apache server.

      3. Next, we'll test to see if Apache's CGI functionality is working properly. First, we'll create a simple web page that will contain a link to a CGI script (we'll create the script in a minute). Open EasyHTML and create a file that contains the following line in the BODY:
        <A HREF="/cgi-bin/test.pl">CGI test</A>

        Save the file with the name "test.html" in the "htdocs" directory under the "Apache" directory. To see it, type this address into your browser http://127.0.0.1/test.html

      4. Now start writing another new file. This will be your first CGI script! First we'll actually make a script that doesn't work, in order to illustrate a couple of important concepts. Type the following:
        print "howdy!\n";

        Save the file with the name "test.pl" in the "cgi-bin" directory under the "Apache" directory

      5. Now go back to your web browser and click the link on your test.html page. You should get a "server error" message. Let's open the error log file (as described earlier) and see what the message says.

        There are two problems with this script that are preventing it from executing. One is that we need to tell Apache where to find Perl on the computer's hard drive. This is because Apache can't execute your script by itself - it needs to send your script to the Perl interpreter - perl.exe - in order for the script to run. Add the "shebang" line as the first line in your script:

        #!perl

        Save the script, and return to your test.html page and click the link again. You should get another server error. Let's open the error.log file again and see what it says.

        It should say something like "malformed header from script. Bad header." This is happening because anything that your server is going to send to a user must first identify what kind of file it is - this is done with the "header", as described earlier. The server sends this header to your web browser so that the browser knows whether to display the file (as it does with HTML files) or to start a download of the file (as it does with, for example, Zip files).

        To create the header, add the following line after the shebang line:

        print "content-type: text/html\n\n";

    2. Further Testing

      Copy-and-paste the "birthday" form from the lecture notes into a file and save it to the htdocs directory. Then do the same with the CGI script that receives the form submission (but save the script to the cgi-bin directory). Test it to see if it works, and try making some changes to the script (whatever you like) so you can get comfortable working in the CGI environment.

    3. Final Project Work

      Last week you worked on the HTML forms for your Project.

      If you're doing the Calendar, tonight you can work on the script that receives the form submissions and performs input validation on the data from the Event Entry page (see Section III of the Project description). For now, you can display the form inputs to a web page, since we haven't yet discussed saving data to files.

      If you're doing the Camera Shopper, tonight you can work on the script that receives the form submission from the Search page and generates the results page (see the Project description). We haven't yet discussed working with files, so for now your script can simply display the search results in a web page.

Return to CIS 24 home page