Saturday, May 31, 2008

In Theory . . . .

This is not exactly a complaint. But . . .

I'm learning Perl now, along with sed and awk and regular expressions, and I'm noticing that in nearly all cases, these books are aimed at people who have real-life problems that can be solved by these functions. At this point, I don't. I have not yet worked in an environment in which I need powerful searching and text processing and programming tools. Each of the books is very basic and approaches the subject from the view of a non- or new programmer, which is helpful. But the examples are things like "$apple = 'red'" and replacing the variable "$foo" with the text "bar." While simple and illustrative, I'm having trouble at this point imagining real life situations where I would apply these skills.

I guess what I need is to talk to people who use these skills and have them show me how they work, but for now I can only learn these in my theory-bubble.

Regular Expressions

In a recent job interview (ahem), I was asked what I knew about regular expressions. I had to say, honestly, that I had probably heard the term before but that I didn't know what they were. So post-interview, I did some investigations in my Unix books and on the web to discover what was meant by the term. This turned out to be more difficult than I thought, because what I was reading was not making much sense to me, and the grep function in the Unix shell seemed more mysterious than ever. Finally, I came across a 10+-year-old edition of Mastering Regular Expressions in netLibrary, and several things clicked.

Having been a professional librarian for nearly two years, a graduate student for the previous two, and a paraprofessional who did reference for four years, I've been asked to do computer-based searching on a daily basis for over six years. Fortunately, thanks to algorithmic search engines (like Google), this task has become very simple. But I'm spoiled. I came to libraries after the miserable days of search engines that only matched text strings. In those medieval days, there was no forgiveness for spelling errors or misplaced spaces, no helpful "Did you mean . . . ?" features or "related searches" that got thrown up for your convenience if you typed "freinds" instead of "friends." When a library patron comes in and asks for a book with "snow flower" in the title, a useful shortcut (given the fact that our library catalog is still quite unforgiving) is to search those words in Amazon.com or Google, which nearly always works.

The way "traditional" computer-based library catalogs work is with "wildcard" characters. This way, if you're unsure of the spelling of "friends" or "weird," you can substitute a nonalphabetic character in place of one or more letters in the text string. Hence, "friends" can be rendered "fr??nds" or "fr*nds," and the computer will find:
  • for "fr??nds," all seven-character text strings that begin with "fr" and end with "nds"
  • for "fr*nds," all text strings (of any length) that begin with "fr" and end with "nds"
In this case, you use an "expression" of alpha and non-alpha characters to search for actual text strings inside a group of files.

In regular expressions proper, this concept is taken to extremes, and you are required to know many different symbols for extremely precise searches. I am just learning these, so I don't yet have them down, but I know that you place your expression between forward slashes like this:

/expression/

What goes between the slashes would be a group of symbols like ^/n\$ that would let you search something as specific as "all of the records that start with a number and have 'n' as the third character and end with a g." This is a very powerful way to do targeted searching for, say, thousands of lines of open source library information system computer code, or multiple databases of patron and MARC records. For instance. :-)

Of course for all of our other searching needs, we'll stick to Google!

Friday, May 30, 2008

Learning Perl

I've learned that one of the tools that I'm expected to have under my belt as a system administrator is Perl, which is used for managing all kinds of things. I'm not a programmer, and many of the concepts of programming languages are new to me. I've always assumed that knowing HTML and its variants gave me enough background to understand computer languages like Perl or Java or C++, and they do to a small degree. For instance, when I first found the "view source" option on Internet Explorer a number of years ago and it opened a text file with all that code, I really had no idea what it was or what it did. Graduate school changed this, and by the end of my third technology/web-oriented class, I was writing papers that analyzed the HTML of major corporate web sites. Knowing those skills gave me the confidence (hubris?) to believe that I actually know what makes computers behave the way they do.

Then I found Linux . . .

Aside from the obvious cosmetic and ideological differences between Linux and Microsoft (which is really the only other milieu I'm comfortable in at this point), Linux (based on Unix) requires use of the command line in a way that Microsoft does not. So even though I do most of my work and play in my (very pretty) GUI, I am using the command line more and more. I'm much less afraid of it than I was. The most extensive shell work I've done so far was installing the Evergreen ILS server, but I'm still learning on this (more for a later post).

Okay, back to Perl. I've so far been using two books to get the basics down:
I used the second book for a few days, but I found that the 24 Hours book was more clear in its tone and a bit less scattered as far as subject matter goes. As I investigate Perl further, I can see that there are two books that kind of everybody uses and talks about (even if they don't love them, exactly):
I have ordered copies from Amazon that should be here very soon. There are actually previous editions available through the GALILEO subscription to netLibrary (if you're not from Georgia, investigate your library's online options), but I've decided that these are worth owning since I'm now pretty serious about learning Perl.

There are also, as you might expect, many online resources for learning Perl. The most extensive and comprehensive is the actual "perldoc" documentation, which is included in any distribution of Perl, but I find to be much more readable on the web (they also allow you to export pages as PDF documents for easy printing). The other advantage to this approach is that, like Perl itself, the documentation is modular, and I can do a close reading of any given section in a short amount of time.

I have also joined the Perl Monks website, which is forum-based and seems to carry a level of fanaticism that a newbie like me can't quite comprehend yet. But as I progress, I intend to use and reference those forums.

Monday, May 5, 2008

My Journey to Here

I'm a reference librarian in a public library who is a recent graduate of Florida State University's College of Information with a Masters of Science in Library and Information Studies (M.S.L.I.S.). I specialized in Information Architecture, which is all well and good, but since I haven't yet really applied any of my IA knowledge and skills, it's kind of like having been an English major in my undergraduate degree, which is to say not worth much. My ALA-accredited masters' degree has definitely improved my job situation within libraries, but since I'm quickly approaching the end of my rope as far as working with the public goes, I'm looking for something else.

I have recently become a Linux devotee, which has completely altered my perspective on both computing and the technical goals of librarianship. I do believe that computer access should be a part of library service since there is so much bad information and misinformation out there. Furthermore, I believe if there is an appropriate place for people to learn about free software, it is the library. Unfortunately, libraries have taken the corporate path to software provision, and like the rest of the business world, Microsoft rules. The library software world, a microcosm of the larger software world, has its proprietary Goliath and its open source David, and my current employer is completely glued to the Goliath camp and scoffs at alternatives.

So I've decided that library technology will no longer be a hobby for me, and I would like to become a certified digital librarian. Since I've finished grad school with no plans to return and there is no set course of study for this, I will devise my own. Then I will shop myself around to prospective employers who want a grounded, visionary, thoughtful, well-informed, highly-skilled, library-savvy professional at the helm of their tech decisions. So here I am and I'm ready to get started!