Wednesday, July 9, 2008

The Mythical Man-Month

When I first started reading about Linux and Free Software, I kept coming across certain titles of classic programming texts, and one of the most cited is The Mythical Man-Month by Frederick P. Brooks. So I brought this home to read, since in my new position (this is week four) I am living on the periphery of a software programming project and I would like to know as much about the culture of programming as I can. The Mythical Man-Month was apparently the revolutionary programming book of the 1970s and laid out the standard for software development wisdom for a generation (and still seems to be the model). The most important assertion is Brooks' Law, which (against the conventional wisdom of the time) says that adding programmers to a late software project makes it later. I can see that this would be true, even from my slim experience with the developers at Equinox Software. The idea behind the theory is that you lose time (measured in "man-months") having to train and orient the programmers you add, and that this factor increases dramatically the more people you add.

As a new person to this group, they are having to train and orient me, which takes them away from their other duties. This group has been administering the GPLS system since PINES moved to the Evergreen ILS, and they need to have me trained up and in place before they can really concentrate on developing the software program itself. So here I am, adding man-months. I fancy myself to be worth the effort, but for the moment I certainly feel like a drag!

Thursday, June 12, 2008

Working Through the Llama

I've been working my way through the llama book, which has indeed filled in some gaps. The camel is also helping to clarify, though it is truly an advanced book and if I had not already done so much reading about Perl, Unix, and programming, and had not been sitting around thinking about Perl so much, I think the learning curve would be too steep. Anyway, here is the chapter list for the llama:

  1. Introduction
  2. Scalar Data
  3. Lists and Arrays
  4. Subroutines
  5. Input and Output
  6. Hashes
  7. In the World of Regular Expressions
  8. Matching with Regular Expressions
  9. Processing Text with Regular Expressions
  10. More Control Structures
  11. File Tests
  12. Directory Operations
  13. Strings and Sorting
  14. Process Management
  15. Perl Modules
  16. Some Advanced Perl Techniques
I've made it through Chapter 5 - Input and Output, but I'm not as behind as I think, since my other research has given me some introduction into hashes and regular expressions. I notice that there are three chapters devoted to regular expressions alone. Perl was originally intended as a text processing language, and regular expressions (as I've mentioned) are key to those features. As I've also mentioned, I'm a global learner, so seeing the "big picture" is key for me having any sort of grasp on the detailed concepts (which logically fall into place for me once I have the big picture). It seems that the experience of learning Perl is teaching me a lot about how I learn unfamiliar things.

Tuesday, June 10, 2008

Wow! I'm in!

I have been sitting on this for weeks, since I've been waiting for everything to finalize, but I am the new PINES System Administrator for the Georgia Public Library Service! I now feel free to tell the whole story behind this, since I've been holding off for so long:

When I began library school in fall 2004, I had been a library paraprofessional for a little over two years. I had worked in the college library as a student and then in churches and bookstores, then in a public library. I had plans when I started school to become a library manager or an academic reference librarian, or something of that kind. Then Google announced that it would began digitizing books for the web, which got me thinking about the future of librarianship, my own computer skills, etc., so I changed my Masters Concentration from management to information architecture (IA).

The IA concentration had a definite focus on web design and development and usability. To supplement this with some nuts and bolts, I took a data networking class that taught me the history, development, conceptual models, and standards for networks, most of which I could supplement by studying the actual computer network running in the library where I worked at the time, and talking to my colleagues who did our tech support. I also learned basic UNIX commands, and that exposure (and the textbook I kept) helped me find my way around my Linux systems at home. Based on what I learned in that class, I set up an Ethernet network in my apartment, running great lengths of Cat 5 cable through our bedrooms and hallways, stuffing it down so our cats wouldn't eat it.

I also took a class on Information Systems Management, which delved very deeply into software concepts, particularly databases and programming concepts (though we did not learn any languages) like XML and its many "ML" applications for data storage. For that class I had to develop software requirements documents, both for actual situations and for a fictional company (though we evaluated actual software products), and to make recommendations for what each organization should do.

I then progressed to an advanced web development class in which we studied web standards, XHTML, and CSS and explored many different resources. I then took usability analysis, information organization (which expanded on data formats) and finally, cataloging and classification, in which we were required to encode and edit MARC records using the OCLC Connexion form based on the AACR2.

As I finished school, the assistant director of the library asked me to apply for a Librarian position that came open at the beginning of that summer. I applied and became a reference librarian in a suburban branch of our system. In late spring of 2007, I saw an opening for PINES Program Manager at GPLS and I applied, and to my pleasant surprise, got an interview, but I didn't get the job. I had admired GPLS for a number of years at that time, and they had just moved to their home-grown, open-source Evergreen ILS. I decided that, if they would have me, this was the place for me, and I was encouraged to reapply by the PINES Program Director.

So when I saw the PINES System Administrator position open, I considered it a stretch, but applied anyway with the idea that I would hit the books and learn Perl, shell scripting, and some of the other requirements listed in the job description. After a second interview and several harried emails, I was offered the job, and I start later this month!

I'm extremely happy about this development, and a little nervous. Since this has become my goal, anyway, though, I know I will do everything I can to learn all the other skills I need to know to become the best system administrator I can be.

Thursday, June 5, 2008

The Camel and the Llama

I received Learning Perl (the "llama") and Programming Perl (the "camel") the other day. My original plan was to sail through the llama first before reading the camel at all, but that has proved to be more tedious that I thought. Tedious because I'm realizing I understand most of the "basics" of Perl. My hands on experience with scalars, arrays, and hashes is lacking, but I have a pretty good grasp on what they are and how they work. This means that, so far, my reading of the llama has been a close search for missing concepts - holes in my knowledge of these techniques - and that's not exactly inspiring reading, especially after ordering these books and waiting to get them.

So for no other reason than that I bore easily with repetition, I decided to start the camel anyway, and I'm so glad I did. Just my reading of much of the first chapter last night in bed was both entertaining and very illuminating. All of the books I've used so far kind of dive right in and get into the mechanics of Perl (here's what we mean by the term "scalar literal," here's how you use this or that kind of value/variable, how to control the flow of your program with loops and conditionals, etc.) without talking much about the rationale. The first chapter of the camel (which is co-authored by Perl creator Larry Wall) gets a little into the nuts and bolts of Perl, but is mainly a conceptual overview, and focuses on the logic of why Perl is the way it is.

I'm a fairly "foundational" thinker, which is to say that if you provide me with a solid conceptual framework of any given system, along with some rationale as to why it's built that way, I will understand how the system works. The "how" and the "why" are very much connected for me as a learner, and when I have had the opportunity to train people on things like library information systems, I nearly always teach from a "systems" view. This may be confusing the hell out of my poor trainees, but I've never had someone tell me so. Some people - maybe most, given the approach of most teaching guides - just want (need) to know the "how" and don't really care about the "why" - that's not me.

So I've learned from my other books how to create a scalar variable (a "scalar" value is a single thing, like a number or a text string) by saying something like

$sys_admin = "chris";

You create an array (list) similarly, except, rather than the $ sign you use @:

@pines_staff = ("Ellen", "Bob", "Frank", "Louise");

All of the books I've used (including the llama so far) do this similarly and then turn around and say "Wow! You just wrote your first Perl program!" which leaves me kind of blinking, going "I did what now?"

In a few short paragraphs within the first few pages of the camel, though, I learned:

  1. That a scalar value is a single instance of a "noun" (e.g. "this particular baseball in my hand").
  2. That a scalar variable is a way to contain the "idea" of a baseball without necessarily referring to a particular baseball. It could be the one in my hand, but it could also be the one that won the Braves' game on a grand slam.
  3. That the symbols for Perl variables look like the first letters of what kind of variable it is ($ is for scalar, @ for array).
  4. That the difference between an array (a simple list) and a hash (a list of paired values, usually arranged in a table) is that in an array you can look up the value by number (order in the list) and in a hash you can look up the value by name (the first value of each pair).
This is what I've been needing to truly understand Perl! So now my strategy is to still go through the llama, but to be reading the camel alongside it.

Sunday, June 1, 2008

Learning Perl, Part 2

Okay. Here's an attempt to share what I've learned so far in Perl, without writing a book of my own:

Perl is used for all kinds of things in the computer world, including text processing and manipulation (which includes searching and reporting), system administration (tying together and automating smaller tasks), web development, and database management.

Perl is free (under both the GPL and an "artistic license"), and, like Linux and true free/open source programs, it is community-based and community-driven. The fullest expressions of this community (that I've seen) are the Perl Monks website, which provides forum-based support and sharing, and the Comprehensive Perl Archive Network (CPAN), which is a huge repository of Perl modules and other programs that are shared to prevent wheel reinvention. As new Perl users become more experienced, they are encouraged to help others via Perl Monks or by contributing to CPAN when they come up with something new and useful.

There is a culture surrounding the community and the language that seems to center on the offbeat views of the language's inventor Larry Wall, who aside from being the patron saint of the Perl universe, is a committed Christian who sees Perl as a small way in which he has made the world a better place. Among Wall's often-quoted aphorisms is the observation that the three virtues of a good programmer are laziness, impatience, and hubris, to which he later added the virtues of diligence, patience, and humility (of course, the polar opposites). This sort of wordplay and whimsy permeates the Perl community, and apparently the "Camel" book is as much an introduction to the culture of Perl as it is a reference for Perl the programming language.

Most of the work in Perl is done by the Perl interpreter (referred to as "perl" - lower case), and the programmer's job is to write scripts that make the perl program do what you're trying to accomplish. One of the stated goals of Perl is to make the easy easy and the difficult possible. This is something that I'm beginning to understand. For the last few weeks I've been joking (to myself, of course - no one else in my life is interested!) that Perl is "automated algebra." In the past couple of days, though, I'm realizing that wow! yes, Perl is automated algebra!

I'm beginning to see (I'm thick, I know) that Perl just gives you tools to define the logic of your problem, and after you've put it all together, Perl will do the work for you. Perl is automated logic (if . . . then, but not . . ., while . . ., etc.), and whatever programs you write are only limited by your logical or mathematical skill.

I've also learned specific language skills, which I will detail in a later post, but this "aha" moment should give me the motivation to gain some true proficiency in Perl so I can get to work!

Saturday, May 31, 2008

In Theory . . . .

This is not exactly a complaint. But . . .

I'm learning Perl now, along with sed and awk and regular expressions, and I'm noticing that in nearly all cases, these books are aimed at people who have real-life problems that can be solved by these functions. At this point, I don't. I have not yet worked in an environment in which I need powerful searching and text processing and programming tools. Each of the books is very basic and approaches the subject from the view of a non- or new programmer, which is helpful. But the examples are things like "$apple = 'red'" and replacing the variable "$foo" with the text "bar." While simple and illustrative, I'm having trouble at this point imagining real life situations where I would apply these skills.

I guess what I need is to talk to people who use these skills and have them show me how they work, but for now I can only learn these in my theory-bubble.

Regular Expressions

In a recent job interview (ahem), I was asked what I knew about regular expressions. I had to say, honestly, that I had probably heard the term before but that I didn't know what they were. So post-interview, I did some investigations in my Unix books and on the web to discover what was meant by the term. This turned out to be more difficult than I thought, because what I was reading was not making much sense to me, and the grep function in the Unix shell seemed more mysterious than ever. Finally, I came across a 10+-year-old edition of Mastering Regular Expressions in netLibrary, and several things clicked.

Having been a professional librarian for nearly two years, a graduate student for the previous two, and a paraprofessional who did reference for four years, I've been asked to do computer-based searching on a daily basis for over six years. Fortunately, thanks to algorithmic search engines (like Google), this task has become very simple. But I'm spoiled. I came to libraries after the miserable days of search engines that only matched text strings. In those medieval days, there was no forgiveness for spelling errors or misplaced spaces, no helpful "Did you mean . . . ?" features or "related searches" that got thrown up for your convenience if you typed "freinds" instead of "friends." When a library patron comes in and asks for a book with "snow flower" in the title, a useful shortcut (given the fact that our library catalog is still quite unforgiving) is to search those words in Amazon.com or Google, which nearly always works.

The way "traditional" computer-based library catalogs work is with "wildcard" characters. This way, if you're unsure of the spelling of "friends" or "weird," you can substitute a nonalphabetic character in place of one or more letters in the text string. Hence, "friends" can be rendered "fr??nds" or "fr*nds," and the computer will find:
  • for "fr??nds," all seven-character text strings that begin with "fr" and end with "nds"
  • for "fr*nds," all text strings (of any length) that begin with "fr" and end with "nds"
In this case, you use an "expression" of alpha and non-alpha characters to search for actual text strings inside a group of files.

In regular expressions proper, this concept is taken to extremes, and you are required to know many different symbols for extremely precise searches. I am just learning these, so I don't yet have them down, but I know that you place your expression between forward slashes like this:

/expression/

What goes between the slashes would be a group of symbols like ^/n\$ that would let you search something as specific as "all of the records that start with a number and have 'n' as the third character and end with a g." This is a very powerful way to do targeted searching for, say, thousands of lines of open source library information system computer code, or multiple databases of patron and MARC records. For instance. :-)

Of course for all of our other searching needs, we'll stick to Google!