September 2, 2010

Perl / CGI Snippets & Regular Expressions

Almost everything I write is in Perl, and sometimes I spend way too long either looking for a regex or code snippet to do something, or worse yet — a piece of code that I’ve written and can’t remember which file it’s in.

I decided to put together this list of Perl / CGI regular expressions and code snippets. These are things that I’ve used, and this list will be as much a resource for me as it will be for anybody else who chooses to use it.

If you want to submit your own code, or you know a source for some of the more complex regexs, please let me know. I’ll cite any sources I know of and I’m totally open to suggestions if you think that anything I share is insecure/inefficient, etc.

This list is a work in progress.

Last updated: 11.7.2009

How to split a name

$name = "Roy James Jones"
@names = split(/ /, "$name");
$first_name = $names[0]; # result = Roy
$middle_name = $names[1]; # result = James
$last_name = $names[2]; # result = Jones

Back to list

How to match a name (regex)

if ($name =~ m/^[a-zA-Z\s\'\-]+$/) {
# this is good
}
else {
# this is bad
}

— OR —

if ($name !~ m/^[a-zA-Z\s\'\-]+$/) {
# this is bad
}

This will match letters, spaces, apostrophes, and hyphens — fairly common elements in a name.
Back to list

How to round a number to 2 decimal places

$unrounded_number = "34.1256783653";
$rounded_number = sprintf("%.2f", $unrounded_number);
$rounded_number = "34.13"; # results

If you need to round a number to 1 decimal place, 3 decimal places, etc., you can just change the “%.2f” after the sprintf to “%.1f” or “%.3f”, etc.
Back to list

How to format perl time with months,days,hours, etc.

($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
@MaxDays = ('31','28','31','30','31','30','31','31','30','31','30','31');
@Months = ('January','February','March','April','May','June','July','August','September','October','November','December');
@Wdays = ('Sunday','Monday','Tuesday','Wednesday','Thursday','Friday','Saturday');
$weekday = "$Wdays[$wday]";
$month = "$Months[$mon]";
$csec = $sec;
$cmon = $mon + 2;
if ($cmon > 11) { $cmon = $cmon - 11; $cyear = $year + 1; }
else { $cyear = "$year"; }
if ($mday == $MaxDays[$mon]) { $cday = 1; }
else { $cday = $mday + 1; }
$mon++;
$year = $year + 1900;
if ($hour < 12) { $ampm = "am"; }
else { $hour = $hour - 12; $ampm = "pm"; }
if ($min < 10) { $min = "0$min"; }
if ($mday < 10) { $mday = "0$mday"; }
if ($mon < 10) { $mon = "0$mon"; }
if ($hour eq "0") {$hour = 12;}
$tstamp = "$month $mday, $year $hour:$min$ampm";
$datestamp = "$month $mday, $year";
$now = time();

Back to list

How to escape an apostrophe (to stop MySQL errors)

$mystring =~ s|\'|\\'|g;

Back to list

Debug MySQL Query - create a log of actions to identify problems


DBI->trace(2,"somefilename.txt");

You can change the "2" to a higher number and get more details (such as 4, for instance). Just place the DBI->trace before your query that you want to check out, and be sure to comment it out or delete it after you've found your problem... the log file can get very, very large.
Back to list

Regex to validate email address format

if ($email =~ /^[A-Z0-9][_\-\.A-Z0-9]*\@\[?[\-\.A-Z0-9]+\.([A-Z]{2,4}|[0-9]{1,3})\]?$/i) {
#this is good
}
else {
#this is bad
}

--- OR ---

if ($email !~ /^[A-Z0-9][_\-\.A-Z0-9]*\@\[?[\-\.A-Z0-9]+\.([A-Z]{2,4}|[0-9]{1,3})\]?$/i) {
# this is bad
}

I did not write this -- I found it on some site years (and years) ago, I've not had any issues to my knowledge where it rejects valid email addresses (format-wise). I can't explain everything that it's doing... but it works.
Back to list

Remove / format Microsoft Smart Quotes (Word to plain text conversion)

$article = "your Word/ other MS-based article / content ";

# 0x93 (147) and 0x94 (148) are "smart" quotes
$article =~ s/[\x93\x94]/"/g;
# 0x91 (145) and 0x92 (146) are "smart" singlequotes
$article =~ s/[\x91\x92]/'/g;
# 0x96 (150) and 0x97 (151) are emdashes
$article =~ s/[\x96\x97]/--/g;
# 0x85 (133) is an ellipsis
$article =~ s/\x85/.../g;
## 0x95 • replacement for unordered list
$article =~ s/\x95/*/g;

I found these regular expressions after hours of digging. I basically copied and pasted this info from this Smart Quote Repair Script page. All credit goes to the person who wrote these.

If you need to find more of the Hex 0x codes - try the CP1252 (Windows ANSI) / ISO-8859-1 / UTF-8 Conversion Chart.

Also, if you find that there are still Microsoft characters that cause issues, please let me know (leave a comment below, that's fine) and I'll update this section accordingly.

Back to list

Connect to a MySQL database

$dbname = "mysql_databasename";
$host = "localhost"; # usually localhost
$dbuser = "mysql_databaseUsername";
$dbpass = "mysql_databaseUserpassword";

sub opendb {
  use DBI;
  $dbh = DBI->connect("DBI:mysql:$dbname:$host","$dbuser","$dbpass");
return; }

sub closedb {
  $dbh->disconnect();
  return; }

Just store the subroutines and database connection info in a safe place (below publicly accessible files), and call &opendb; &closedb; whenever you need to access the database.

Back to list

Parsing form data - foreach array with checkboxes

-- CGI Script --

#!/usr/bin/perl 

use CGI;
$in = new CGI;
@buddies = $in->param('buddy');

print "content-type: text/html\n\n";

foreach $friend(@buddies) {
# do something with $friend
} 

-- HTML--

<form method="post" action="somefile.htm">
<input type="checkbox" name="buddy" value="Fred">Fred
<input type="checkbox" name="buddy" value="Sue">Sue
<input type="checkbox" name="buddy" value="Ralph">Ralph
<input type="checkbox" name="buddy" value="Jim">Jim
<input type="checkbox" name="buddy" value="Karen">Karen
<br>
<input type="submit" value="Who are my friends?">
</form>

Back to list

Read / work with delimited file contents in perl

## file's structure - for example reference - tab delimited
# name email address phone number address1 address2 city state zip

open (DATAFILE, "somefile.txt") or die "Problema: $!";
while () {
chomp;
@fields = split(/\t/, $_);  # splits tab separated fields - replace \t with \, \^ \|, whatever you need

### sometimes it's better to make the field names easier to understand

$name = $fields[0];
$email_address = $fields[1];
$phone_number = $fields[2];
$address1 = $fields[3];
$address2 = $fields[4];
$city = $fields[5];
$state = $fields[6];
$zip  = $fields[7];

## if there are quotes around the field data, remove them

$name =~ s|\"||g;
$email_address =~ s|\"||g;

## just follow the same pattern to remove quotes from the rest

## work with your data -- add it to another text file, add it to a database, etc.

print "$name $email_address $phone_number<br />\n";

}

close (DATAFILE);

This will work with CSV (comma separated) files, tab separated files, pretty much any delimiter. I've included a couple of regexes to remove the "quotes" around the outputs -- I'm not sure if the quotes show up in all files, but they did when I was parsing my PayPal sales data (tab separated).

Back to list

Match a string within an array


@array1 = ("Bnn","Cnn","Dnn");
$string = "Cnn";

if (grep {$_ eq $string} @array1) {
print "Found $string\n";
}
else {
print "Nope, no $string\n";
}
I found this info in a post on Codecall.net by KevinADC. All credit belongs there -- I'm just hopefully making it easier to find. Back to list Remove trailing and leading spaces in a string
# remove leading spaces
$variable =~ s/^\s+//; 

#remove trailing spaces
$variable =~ s/\s+$//;

This removes excess whitespace at the beginning and end of a string. Very, very handy for user-generated database input variables.
Back to list

Comments? Share them below:

Leave a Reply




This site uses KeywordLuv. Enter YourName@YourKeywords in the Name field to take advantage.