Perl / CGI Snippets & Regular Expressions
Almost everything I write is in Perl, and sometimes I spend way too long either looking for a regex or code snippet to do something, or worse yet — a piece of code that I’ve written and can’t remember which file it’s in.
I decided to put together this list of Perl / CGI regular expressions and code snippets. These are things that I’ve used, and this list will be as much a resource for me as it will be for anybody else who chooses to use it.
If you want to submit your own code, or you know a source for some of the more complex regexs, please let me know. I’ll cite any sources I know of and I’m totally open to suggestions if you think that anything I share is insecure/inefficient, etc.
This list is a work in progress.
Last updated: 11.7.2009
- How to split a person’s name
- How to match a name (regex)
- How to round a number to 2 decimal points
- How to format date and time (months, names, days, hours, etc.)
- How to escape an apostrophe in a string (avoid MySQL errors)
- Debug MySQL Query – check for errors
- How to validate an email address format (regex)
- How to remove / convert Microsoft “Smart Quotes” from / in text
- Connect to a MySQL database – subroutines
- Example form data script with checkbox inputs using a foreach array
- Parse delimited file – tab, comma, carat, etc.
- Match a string in an array
- Remove leading and trailing spaces in a string
$name = "Roy James Jones" @names = split(/ /, "$name"); $first_name = $names[0]; # result = Roy $middle_name = $names[1]; # result = James $last_name = $names[2]; # result = Jones
if ($name =~ m/^[a-zA-Z\s\'\-]+$/) { # this is good } else { # this is bad }— OR —
if ($name !~ m/^[a-zA-Z\s\'\-]+$/) { # this is bad }
This will match letters, spaces, apostrophes, and hyphens — fairly common elements in a name.
Back to list
How to round a number to 2 decimal places
$unrounded_number = "34.1256783653";
$rounded_number = sprintf("%.2f", $unrounded_number);
$rounded_number = "34.13"; # results
If you need to round a number to 1 decimal place, 3 decimal places, etc., you can just change the “%.2f” after the sprintf to “%.1f” or “%.3f”, etc.
Back to list
How to format perl time with months,days,hours, etc.
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
@MaxDays = ('31','28','31','30','31','30','31','31','30','31','30','31');
@Months = ('January','February','March','April','May','June','July','August','September','October','November','December');
@Wdays = ('Sunday','Monday','Tuesday','Wednesday','Thursday','Friday','Saturday');
$weekday = "$Wdays[$wday]";
$month = "$Months[$mon]";
$csec = $sec;
$cmon = $mon + 2;
if ($cmon > 11) { $cmon = $cmon - 11; $cyear = $year + 1; }
else { $cyear = "$year"; }
if ($mday == $MaxDays[$mon]) { $cday = 1; }
else { $cday = $mday + 1; }
$mon++;
$year = $year + 1900;
if ($hour < 12) { $ampm = "am"; }
else { $hour = $hour - 12; $ampm = "pm"; }
if ($min < 10) { $min = "0$min"; }
if ($mday < 10) { $mday = "0$mday"; }
if ($mon < 10) { $mon = "0$mon"; }
if ($hour eq "0") {$hour = 12;}
$tstamp = "$month $mday, $year $hour:$min$ampm";
$datestamp = "$month $mday, $year";
$now = time();
How to escape an apostrophe (to stop MySQL errors)
$mystring =~ s|\'|\\'|g;
Debug MySQL Query - create a log of actions to identify problems
DBI->trace(2,"somefilename.txt");
You can change the "2" to a higher number and get more details (such as 4, for instance). Just place the DBI->trace before your query that you want to check out, and be sure to comment it out or delete it after you've found your problem... the log file can get very, very large.
Back to list
Regex to validate email address format
if ($email =~ /^[A-Z0-9][_\-\.A-Z0-9]*\@\[?[\-\.A-Z0-9]+\.([A-Z]{2,4}|[0-9]{1,3})\]?$/i) { #this is good } else { #this is bad }--- OR ---
if ($email !~ /^[A-Z0-9][_\-\.A-Z0-9]*\@\[?[\-\.A-Z0-9]+\.([A-Z]{2,4}|[0-9]{1,3})\]?$/i) { # this is bad }
I did not write this -- I found it on some site years (and years) ago, I've not had any issues to my knowledge where it rejects valid email addresses (format-wise). I can't explain everything that it's doing... but it works.
Back to list
Remove / format Microsoft Smart Quotes (Word to plain text conversion)
$article = "your Word/ other MS-based article / content "; # 0x93 (147) and 0x94 (148) are "smart" quotes $article =~ s/[\x93\x94]/"/g; # 0x91 (145) and 0x92 (146) are "smart" singlequotes $article =~ s/[\x91\x92]/'/g; # 0x96 (150) and 0x97 (151) are emdashes $article =~ s/[\x96\x97]/--/g; # 0x85 (133) is an ellipsis $article =~ s/\x85/.../g; ## 0x95 • replacement for unordered list $article =~ s/\x95/*/g;
I found these regular expressions after hours of digging. I basically copied and pasted this info from this Smart Quote Repair Script page. All credit goes to the person who wrote these.
If you need to find more of the Hex 0x codes - try the CP1252 (Windows ANSI) / ISO-8859-1 / UTF-8 Conversion Chart.
Also, if you find that there are still Microsoft characters that cause issues, please let me know (leave a comment below, that's fine) and I'll update this section accordingly.
$dbname = "mysql_databasename";
$host = "localhost"; # usually localhost
$dbuser = "mysql_databaseUsername";
$dbpass = "mysql_databaseUserpassword";
sub opendb {
use DBI;
$dbh = DBI->connect("DBI:mysql:$dbname:$host","$dbuser","$dbpass");
return; }
sub closedb {
$dbh->disconnect();
return; }
Just store the subroutines and database connection info in a safe place (below publicly accessible files), and call &opendb; &closedb; whenever you need to access the database.
Parsing form data - foreach array with checkboxes
-- CGI Script --
#!/usr/bin/perl use CGI; $in = new CGI; @buddies = $in->param('buddy'); print "content-type: text/html\n\n"; foreach $friend(@buddies) { # do something with $friend }
-- HTML--
<form method="post" action="somefile.htm"> <input type="checkbox" name="buddy" value="Fred">Fred <input type="checkbox" name="buddy" value="Sue">Sue <input type="checkbox" name="buddy" value="Ralph">Ralph <input type="checkbox" name="buddy" value="Jim">Jim <input type="checkbox" name="buddy" value="Karen">Karen <br> <input type="submit" value="Who are my friends?"> </form>
Read / work with delimited file contents in perl
## file's structure - for example reference - tab delimited # name email address phone number address1 address2 city state zip open (DATAFILE, "somefile.txt") or die "Problema: $!"; while () { chomp; @fields = split(/\t/, $_); # splits tab separated fields - replace \t with \, \^ \|, whatever you need ### sometimes it's better to make the field names easier to understand $name = $fields[0]; $email_address = $fields[1]; $phone_number = $fields[2]; $address1 = $fields[3]; $address2 = $fields[4]; $city = $fields[5]; $state = $fields[6]; $zip = $fields[7]; ## if there are quotes around the field data, remove them $name =~ s|\"||g; $email_address =~ s|\"||g; ## just follow the same pattern to remove quotes from the rest ## work with your data -- add it to another text file, add it to a database, etc. print "$name $email_address $phone_number<br />\n"; } close (DATAFILE);
This will work with CSV (comma separated) files, tab separated files, pretty much any delimiter. I've included a couple of regexes to remove the "quotes" around the outputs -- I'm not sure if the quotes show up in all files, but they did when I was parsing my PayPal sales data (tab separated).
Match a string within an array
@array1 = ("Bnn","Cnn","Dnn");
$string = "Cnn";
if (grep {$_ eq $string} @array1) {
print "Found $string\n";
}
else {
print "Nope, no $string\n";
}
I found this info in a post on Codecall.net by KevinADC. All credit belongs there -- I'm just hopefully making it easier to find.
Back to list
Remove trailing and leading spaces in a string
# remove leading spaces $variable =~ s/^\s+//; #remove trailing spaces $variable =~ s/\s+$//;
This removes excess whitespace at the beginning and end of a string. Very, very handy for user-generated database input variables.
Back to list
Leave a Reply