Tag Selector

Zaragoza Clouds

by Zaragoza Online
Perl and optimal programming standards PDF Print E-mail
Written by Shantanu Bhadoria   
Friday, 16 October 2009 14:52

Perl is one of those languages you have to be very careful with when developing large scale applications. Personally I use a couple of tools when writing perl code to clean up after me. PerlTidy (http://perltidy.sourceforge.net/) is my favourite  tool for formatting and nicely indenting all my perl code.pod2html is another popular tool useful for generating quick HTML documentation from all your perl code.

before I start with this article, here is another perl joke :)No offense intended :p

perl -e '$a="etbjxntqrdke";$a=~s/(.)/chr(ord($1)+1)/eg;print $a'

 PS: Type that on your command line ;)

One of the things that we love most about Perl is its flexibility, itssimilarity to natural language, and the fact that There's More Than One Way To Do It. Of course, when I say ``we'' I mean Perl hackers; the implicit ``them'' in this case is management, people who prefer other languages, or people who have to maintain someone else's line noise. 

Perl programmers tend to rebel at the idea of coding standards, or at having their creativity limited by arbitrary rules -- otherwise, they'd be coding in Python :). But I think that sometimes a little bit of consistency can be a good thing.

As Larry himself said in one of his State of the Onion talks, three virtues of coding are Diligence, Patience and Humility. Diligence (the opposite of Laziness, if you're paying attention) is necessary when you're working with other programmers. You can't afford to name your variables $foo, $bar, and $stimps_is_a_sex_goddess if someone has to come along after you and figure out what the hell you meant. This is where coding standards come in handy.

 

  1. The verbosity of all names should be proportional to the scope of their use
  2. The plurality of a variable name should reflect the plurality of the data it contains. In Perl, $name is a single name, while @names is an array of names
  3. In general, follow the language's conventions in variable naming and other things. If the language uses variable_names_like_this, you should too. If it uses ThisKindOfName, follow that.
  4. Failing that, use UPPER_CASE for globals, StudlyCaps for classes, and lower_case for most other things. Note the distinction between words by using either underscores or StudlyCaps.
  5. Function or subroutine names should be verbs or verb clauses. It is unnecessary to start a function name with do_.
  6. Filenames should contain underscores between words, except where they are executables in $PATH. Filenames should be all lower case, except for class files which maybe in StudlyCaps if the language's common usage dictates it.

... and a few more, about one printed page in total. For instance, we have a couple of regexp-related guidelines, a couple of points about references and complex data structures (including when not to use them), and a list of our favourite modules that we recommend developers use (CGI, DBI, Text::Template, etc.).

Our documentation standards say to include at least a README, INSTALL and LICENSE file with each piece of software; that each source code file should include the name, author, description, version and copyright information; that any function that needs more than two lines of comments to explain what it does needs to be written more clearly; and that any more detailed documentation should be handed to professional technical writers.

Coding standards needn't be onerous. Just because there are bad coding standards out there, doesn't mean that all coding standards are bad.

I think the way to a good coding standard is to be as minimalist as possible. Anything more than a couple of pages long, or which deviates too far from common practice, will frustrate developers and won't be followed. And standards that are too detailed may obscure the fact that the code has deeper problems.

Here's a second rule: standardise early! Don't try to impose complex standards on a project or team that's been going for a long time -- the effort to bring existing code up to standard will be too great. If your standards are minimal and based on common sense, there's no reason to wait for the project to take shape or the team's preferences to become known.

If you do set standards late, don't set out on a crusade to bring existing code up to scratch. Either fix things as you come to them, or (better) rewrite from scratch. Chances are that what you had was pretty messy anyway, and could do with reworking.

Third rule? I suppose three rules is a good number. The third rule is to encourage a culture in which standards are followed, not because Standards Must Be Obeyed, but because everyone realises that things work better that way. Imagine what would happen if, for instance, mail transport agents didn't follow RFC822. MTAs don't follow RFC822 because they're forced to, but because Internet email just wouldn't work without it. The thought of writing an MTA which was non-compliant is perverse (or Microsoft policy, one or the other).

If your development team understands that standards do make things easier and result in higher quality, more maintainable code, then the effort of enforcement will be small.

Damn, I seem to have found a fourth rule. Oh well.

Fourth rule: don't expect coders to document. Don't expect coders to do architecture or high-level design. Don't expect coders to have an eye for user interface. If they do, that's great, but no matter how many standards or methodologies you lay down, there's no way to change the fact that coding skill is not necessarily related to, and in fact may be inversely proportional to, those other necessary skills. Don't let a set of standards be your crutch when you really need to hire designers or documentors.

 "There's more than one way to do it". So says the byline on the cover of "The Perl Bible", also known as "The Camel Book" - "Programing Perl" by Larry Wall, Tom Christiansen and Jon Orwant. This is both a blessing and a curse - a blessing because you can choose the best way of doing something, and a curse because you can write code that's so obfurscated, so unusual, that it's difficult or impractical to maintain.

Do you want to write code that's cost effective to develop, robust, easy to test, maintain and (later) upgrade, is good for the user, and contain a structure that allows for re-use where appropriate? Good - then it is important (especially with Perl, given the flexibility) to work with some programming standards to help you achieve these ends.

Perl is a language that trusts the programmer to know what he/she is doing - to be used "by consenting adults" if you like. That's as opposed to a "Nanny" language such as Java, where everything must be declared and is checked and double checked by the compiler to ensure it's exactly right. It follows on from this philosophy that a standard should be more a set of guidelines and less a set of rules; with each guideline, ask yourself "why is this being suggested" and if you can find good contrary reason in your particular case, perhaps you've found an instance where the guideline may be broken.

The guidelines will differ from programming team to programming team - a team that's writing major applications thousands of lines long and are all highly trained in the nuances of Perl will have a different ideal to a group of people who do some occasional Perl coding as part of a much wider ranging job, and write code that is typically a few lines of "glueware".

Guidelines for Perl
  • Don't re-invent the wheel - if the right wheel isn't available elsewhere, adopt one to make it right. Is there a standard module, a CPAN module, something internal in your organisation or something you yourself have previously written that does the job or gets you off to a flying start?
    And on that basis, these guidelines rely heavily on the perldoc manual page online at:
    http://search.cpan.org/dist/perl/pod/perlstyle.pod
    and also suggestions by others (such as the BBC) on how it should be modified for their particular use at:
    http://www.bbc.co.uk/guidelines/webdev/AppA.Perl_Coding_Standards.htm
    http://www.rescomp.berkeley.edu/about/ training/senior/progs/Coding-Standards/x31.html
  • GOLDEN RULE 1 Consider WHY you're writing the code in the first place - design first. At the very least, produce a use-case diagram on a piece of paper. If you're coding a web application, also produce a state diagram showing the flow of the user (and administrator if you have one) through the site.
  • GOLDEN RULE 2 Consider all parties in coding and decisions you make. The user (who will usually be the one who makes the biggest investment in your code), the code maintainer, the code tester, as well as yourself. You may want to adjust your coding standards to suit the testers and maintainers - if they're Perl Geeks, you can use a wider range of slightly more obscure constructs than you would if they're just occasional Perlers.
  • The Guidelines are just that - and each is there for a reason. If you understand the reason and feel that it's honestly not applicable, you should be free not to follow it - "at your own risk".
Things that affect the user
Items in this section affect the user and should be given the greatest of attention. The naming of blocks of code (which can be called up by another programmer) and good user documentation is MUCH more important than the internal naming of variables which will be much more hidden within ("encapsulated") once the code is released.
  • GOLDEN RULE 4 Think about reusability. Why waste brainpower on a one-shot when you might want to do something like it again? Consider generalizing your code. Consider writing a module or object class. Consider placing your code in a central library. Make your code sharable by using h2xs to create the modules.
  • GOLDEN RULE 5 Consider making your code run cleanly with use strict and use warnings (or -w) in effect.
  • For portability, when using features that may not be implemented on every machine, test the construct in an eval to see if it fails. If you know what version or patchlevel a particular feature was implemented, you can test $] ($PERL_VERSION inEnglish) to see if it will be there. The Config module will also let you interrogate values determined by the Configureprogram when Perl was installed. And / or use a require stating a minimal code version.
  • Package naming. Perl informally reserves lowercase module names for "pragma" modules like integer and strict. Other modules should begin with a capital letter and use mixed case, but probably without underscores due to limitations in primitive file systems' representations of module names as files that must fit into a few sparse bytes.
  • You SHOULD use subroutines, which are no longer than 100 lines, wherever possible. A subroutine longer than that SHOULD be refactored, as this is likely to make it clearer and easier to maintain.
  • One function does one thing. For example, A function in a module should never print to standard out. Instead, it should return text which the calling script can print when it wants.
  • No global variables.
  • Be consistent. Be especially consistent with usability.
  • GOLDEN RULE 6 Provide good comments AND good user documentation.
  • Open source. Consider giving away your code.
Within the code
This is the third of three sections of this standard document. You should consider the generalities and how your code affects others before you consider the following code-level issues: You SHOULD write code that is easy to read and understand. Some considerations:
  • Block structure and layout - Closing curly brace on a multiline block should line up with the opening keyword
    - 4-column indent - Opening curly on same line as keyword, if possible, otherwise line up
    - Space before the opening curly of a multi-line BLOCK
    - One-line BLOCK may be put on one line, including curlies
    - No space before the semicolon
    - Semicolon omitted in "short" one-line BLOCK
    - Space around most operators
    - Space around a "complex" subscript (inside brackets)
    - Blank lines between chunks that do different things
    - Uncuddled elses
    - No space between function name and its opening parenthesis
    - Space after each comma
    - Long lines broken after an operator (except "and" and "or")
    - Space after last parenthesis matching on current line
    - Line up corresponding items vertically
    - Omit redundant punctuation as long as clarity doesn't suffer
    - Line up corresponding things vertically, especially if it'd be too long to fit on one line anyway.
While you can choose your own rules about bracing style, tab width, spaces and so forth, for new files, you MUST be consistent in applying the style to your code. When modifying existing code that has a well-ordered layout, you MUST use the same standard in your modification.

In the interest of portability, lines MUST be no more than 80 characters long.
  • Write Statements in order to be most readable:
      open(FOO,$foo) || die "Can't open $foo: $!";
    is better than
      die "Can't open $foo: $!" unless open(FOO,$foo);
  • Don't omit brackets and operands and assume the default where it compromises clarity:
      return print reverse sort num values %array;
    could be more readably written as
      return print(reverse(sort num (values(%array))));
  • Use the last, next and redo operators within a loop in order to avoid the obfurscated code that would otherwise be necessary to test the flow of a loop only upon the completion of each iteration. Use loop labels where they help make the code more readable.
  • Avoid using grep() (or map()) or `backticks` in a void context, that is, when you just throw away their return values. Those functions all have return values, so use them. Otherwise use a foreach() loop or the system() function instead.
  • Variable naming. Choose mnemonic identifiers. If you can't remember what mnemonic means, you've got a problem.
While short identifiers like $gotit are probably ok, use underscores to separate words. It is generally easier to read $var_names_like_this than $VarNamesLikeThis, especially for non-native speakers of English. It's also a simple rule that works consistently with VAR_NAMES_LIKE_THIS.

The names of your variables and subroutines are a good way to communicate the meaning of your code to other developers who have to read and maintain it. Therefore all identifiers SHOULD be descriptive.

Use the case of variable names to show their scope:

Constants MUST be written in $ALL_CAPS with underscores to separate words.

Global/package scoped variables MUST begin with an upper-case letter, and use either underscores or studly caps (BiCapitalisation) to separate words (for example, $CustomerDBUser or $Customer_db_user are permitted).

Locally scoped variables my() or local() variables MUST begin with a lower case letter, and use either underscores or studly caps to separate words (for example, $indexFileName or $index_file_name are permitted).

Function and method names adhere to the same standard as Local variables they MUST begin with a lower case letter. You SHOULD also separate words with underscores or studly caps. Function names beginning with an underscore are considered private, and SHOULD NOT be called outside of the package in which they are defined.
  • You may find it helpful to use letter case to indicate the scope or nature of a variable. For example:
       $ALL_CAPS_HERE constants only (beware clashes with perl vars!)
       $Some_Caps_Here package-wide global/static
       $no_caps_here function scope my() or local() variables
  • Regular Expressions. If you have a really hairy regular expression, use the /x modifier and put in some whitespace to make it look a little less like line noise. Don't use slash as a delimiter when your regexp has slashes or backslashes.

    You SHOULD also put line breaks and comments in regular expressions, to make them even more comprehensible.

    Two shorter regular expressions are often more readable than one long one - and faster too.
  • Use the new "and" and "or" operators to avoid having to parenthesize list operators so much, and to reduce the incidence of punctuation operators like && and ||. Call your subroutines as if they were functions or list operators to avoid excessive ampersands and parentheses.
  • Use here documents instead of repeated print() statements. And if you find yourself cutting and pasting code (repeating things) that should be a really big clue that you should be using a sub!
  • Always check the return codes of system calls. Good error messages should go to STDERR, include which program caused the problem, what the failed system call and arguments were, and (VERY IMPORTANT) should contain the standard system error message for what went wrong. Here's a simple but sufficient example:
       opendir(D, $dir) or die "can't opendir $dir: $!";

    Number your error messages (to help on reports to your support service, where a number can be asked for and quoted) and use a consistent format. Error messages that relate to the user's data or environment should include as much information as possible to help the user locate the problem.
      print STDERR "Error 743 - too many fieds in line\nError is in file $filename at line $lineno: $line";
    is much better than
      print STDERR "Data Error";
  • Commenting standards:
    - Purpose, date, release, author, support contact, license terms and copyright
    - One comment every major block of 10 to 30 lines, highlighted by white space
    - Major comment at the start of subs
    - Comment whenever you write something "clever" in the code
    - Provide a sample of the input data as comments
  • English v short names and scope of $_. I personally recommend against the use of the English module as it can have a major detrimental effect on the operation speed of regular expressions. Better to use the short form variable names and comment those which aren't used day to day in your organisation.

    $_ (which is the basis of topicalization in Perl 6) should be used within short areas of code, but do
    - Add a comment where you make a rare use of $_ as default
    - Feel free to specify $_ explicitly.
    - Avoid setting $_ and assuming it's still set many lines later; it's best in smaller scopes / areas of code
  • Rare structures like do, unless, until ... don't use these without good cause; no prizes are offered to the programmer who manages to get every possible construct into a hundred lines of code ;-)


Add this page to your favorite Social Bookmarking websites
Reddit! Del.icio.us! Mixx! Free and Open Source Software News Google! Live! Facebook! StumbleUpon! Yahoo! Joomla Free PHP
Last Updated on Friday, 16 October 2009 15:10
 

Add your comment

BoldItalicUnderlineStrikethroughSubscriptSuperscriptEmailImageHyperlinkOrdered listUnordered listQuoteCodeHyperlink to the Article by its id
SmileCoolCrying or Very SadEmbarrassedA Smoker/Foot in mouthSadUser is an angel (at heart, at least)A Kiss/Lips Are SealedLaughingBiting one's tongue/Put Your Money Where Your Mouth IsBeen Smacked In The Mouth/Wears A Brace/My lips are sealeSurprisedSticking Out TongueConfusedWinkYelling
Your name:
Your email:
Subject:
Comment:
  The word for verification. Lowercase letters only with no spaces.
Word verification: