Roll-your-own or Not-built-here syndrom is a mentality that rightly takes a fair amount of abuse. After all, with more than half a century of software development under our belt as a species do we really need yet another XYZ? Chances are it's been built before, and with a little bit of research you could find an existing workable alternative.
For a now-fairly-large ecosystem like Ruby/Rails, there's generally a plugin or a gem that gets close to what you need it to do, and with ruby's infinite flexbility, it's easy to make a couple of modifications and get what you need. Contrast `gem install XYZ` and `ri XYZ` with taking the time to really learn the problem domain of what your trying to solve, come up with a workable architecture and interface and implement.So we roll-our-own, throwing together some half-assed implementation that'll get the job done but will probably end being more trouble than it's worth.
It should be a no brainer - but often we take the latter path because we fear what we don't understand, and integrating a whole bunch of code, the pedigree of which isn't known is an understandly scary task. So we roll-our-own, throwing together some half-assed implementation that'll get the job done but will probably end being more trouble than it's worth.
That's not the time to RYO. If there's an available library that looks like it does the job, your time would be much better spent going through the code, online docs, tutorials and blog posts about it then jumping the gun and trying to write your own. However, don't necessarily just will-nilly throw the library into your project without giving it a solid once over, both at the API and at the code level.
We spend hours and hours coding, but when's the last time you took a couple of hours to read through the actual code of an auxiliary library that will perform an important function in your app? I'm getting better, but oftentimes the first time I look at the code is when something doesn't work right. It think that's a problem that should be fixed.
So, let's say you've looked at the library and it looks like a winner that's going to meet most of your needs. There's still now one more question that you need to ask yourself:
"Am I completely, overly and 100% happy with the interface that this library provides?"
You might be, but there are a number of example of circumstance where you could have a reason for not being completely thrilled. If the library provides more features than you need, you might not want your code to have to worry about overly complicated method calls. If the library provides too few features, and you need to call some other library's methods to get the output you need, you also have a reason for trepidation. Even something as basic as an unhappyness with the naming convention could cause some doubt - take fpdf for example - a 100% Ruby port of PHP's FPDF that keeps the same not-ruby-friendly name convention and a php method style.
Whatever it is, if you're not 100% happy with the interface or have some doubts about the library itself (is it being maintained, is it well written?), it's a great time to layer something between your code and the library. Technically it's called the Adapter pattern, but putting a layer of interfacing code is something that developers write all the time so you don't have feel like a pattern-snob.
Doing so actually a great way to figure out exactly what parts of the XYZ library you actually need and now much effort it is to make them work. If you're unit testing, you could even stub out calls to the adapter for the time being to make sure the interface you think you want to use makes sense.
(Brief interlude: I know what your thinking - "Who the $#@%@ is this guy, he titles his blog post 'When it's ok to RYO' and now is talking about the complete opposite - integrating libraries and the adapter pattern. What a jerk." That's ok, I've been called worse)
Just bear with me for one more second. Let's say you've got the XYZ library integrated behind a nice clean interface and your project is chugging along for a couple months. As we all know, change happens. Invariably something will pop up that make the XYZ library less of a perfect match than it used to be. Maybe the project is getting more advanced and you need to do more than the library was intended for. Maybe your realize that you don't need full XYZ support, just X+Z and the library is bloating your codebase for no reason.
Whatever the reason. Now's the time to RYO.
What's changed from way back when you originally need the functionality implemented in the XZY library? Well, three things:
For me, all three of these are big wins. We've now done this a number of times inside of Webiva. The rails file_column plugin is a great example - it was easy to get installed and working, but a couple of years down the road we need both less and more functionality and it wasn't being maintained. Luckily, since the Webiva codebase didn't have file_column's everywhere but instead had abstracted them in the DomainFile class - changing out that plugin for some custom code ended up not being too big of a deal. Authorization is another example - we used a great authorization plugin that we determined did a lot more than we needed, and a couple years down the line were able to easily extract the core functions we needed and roll-our-own very simply with a nice clean syntax (the main class SiteAuthorizationEngine clocks in under 200 loc)
Now sometimes the right library just isn't out there and you don't have a choice, but for the other 95% of the time, it's worth taking a look at the library that fulfills 96% of your requirements and writing the other 4% than the other way around. After all, you can always come back and write those 96% later on and you'll have whole lot of a better idea of what you're doing.
One of my favorite things about working in a higher level language is what I like to call "Low Ceremony Objects" (If there's another more popular term for them please let me know and excuse my ignorance) - a.k.a. arbitrary data structures built off of combinations of general purpose containers like arrays and hashs. They are an effective way to quickly create and manipulate data that has a short lifespan but can be counter productive both in terms code readability and maintainability when over-used and sometimes more structured data or traditional objects are much more effective.
When used correctly - LCO's (Yup, tired of typing Low Ceremony Object already) Data structures generally exist for only brief chunks of time that are only defined insomuch as they are used. The existance of general purpose containers in higher level languages and the minimal amount of code needed to create and access them means that data that would otherwise sit in a predefined data structure now often ends up sitting in a combination of Hashes and Arrays ( or Lists and Dictionaries if you swing that way, or PHP's bastard stepchild of both)
As for the name - why low ceremony? Well, like a Vegas shotgun wedding, these generally don't come with a lot of planning - no design documents or even a set structure - so there's generally not a lot of set guidelines involved. Now why Low Ceremony Objects and not Low Ceremony Data or Low Ceremony Structures? Because a large part of the value of using these objects built out of general purpose containers is the large and easy-to-use toolkit of methods either in the objects themselves (ruby, python, java) or the standard accessor functions (php, lisp) which aid greatly in manipulation of the data - adding, removing, searching, sorting etc.
Reading a book on Clojure got me thinking about this again (after struggling with it a couple of years ago - see the footnote) as in the Lisp variant I first learned - Scheme - pretty much every piece of data is a Low Ceremony Object that you can car, cdr, or caaaaadr to your hearts content, but without some additional structure or abstraction added on top of the language complex data quickly becomes difficult to work with.
When used correctly - Low Ceremony Objects are a great boon to development both in programmer productivity and in code cohesion and DRY philosophy - since they are defined instead of declared - their definition is always close in the code base to their usage. If you make a change to the creation of the LCO you have effectively changed it's declaration. You don't need to dig up a header file or separate class source file to make a modification. Want a quick data structure to hold a menu? Two lines and it's done:
[ { :title => 'Item 1', :url => '/item1' },
{ :title => 'Item 2', :url => '/item2', :selected => true } ]
If that menu is going to be created and digested during a portion of 1 Web request then you don't really want to go through the effort of creating a class, especially if that class is just going to be used as a data structure and isn't going to have any of it's own methods. What does the following actually get you:
class MenuItem
attr_reader :title, :url
def initialize(title,url)
@title = title
@url = url
end
end
class MenuItemList
def addItem(item)
@items ||= []
@items << item
end
def item(idx)
@items[idx]
end
end
lst = MenuItemList.new()
lst.addItem(MenuItem.new('item1','/item1')
lst.addItem(MenuItem.new('item2','/item2')
Not a whole lot (Ignoring that no one in their right mind wouldn't just use an Array for MenuItemList unless some more functionality was added). Ruby provides the Struct construct for just this reason - but I'm not sure that using Struct gets you all that much more than just using a Hash. In particular I'm not a fan of passing in a huge parameter list to the constructor as you need to remember the exact order of your properties every time you read code using the initializer otherwise you'll have problems. For my money:
menu_item = { :title => 'Item 1', :url => '/item1',
:selected => true, :dropdown => false, :green => true }
Is more readable than:
menu_item Struct::MenuItem.new("Item 1","/item1",true,false,true)
There are a lot of situations where LCO's are great, but there's two guidelines I now try to follow:
The reason for the first rule is that as soon as you are moved away from the definition of the object errors are going to creep in since there's no help from the interpreter or compiler in properly generating and consuming the LCO.
The second guideline should be pretty self evident - since you only have a definition of the data and not a declaration of the type, once the data gets too difficult to understand you are going to make mistakes using it because you don't have a declaration to fall back on.
LCO's can also be limiting because data can be hard to extend when a small piece of custom code could achieve the same effect. Let's go back to our menu item - what if we made the menu item responsible for displaying itself? Suddenly the whole menu system could be a lot more powerful by overloading the base class (or just duck-typing some other type in there):
class MenuItem
...Previous Definition...
def display; "<li><a href='#{@url}'>#{@title}</a></li>"; end
end
class BlinkingMenuItem < MenuItem
def display; "<li><a href='#{@url}' style='text-decoration:blink;'>#{@title}</a></li>"; end
end
class MenuItemList
...Previous Definition...
def display; "<ul>" + @items.map { |itm| itm.display }.join + "</ul>"; end
end
menu = MenuItemList.new
menu.addItem(MenuItem.new('Item1','/item1'))
menu.addItem(BlinkingMenuItem.new('Item2','/item2'))
print menu.display
Because who doesn't like blinking menu items? Achieving the same sort of functionality with just a data structure would mean adding in a conditional branch for each added option - to the point where your code can degenerate into if/elsif/else spaghetti.
Because of how easy LCO's are to create, they may tend to get overused when some additional design level decisions take more effort than just throwing together an Array of Hashes - don't lose all the benefits of years of work in OOP design just because high level languages make LCO's so easy to create and consume.
One of my least favorite examples of an LCO is the form system in the Drupal Content Management System - the way to create forms is to generate an enormous associative array where different pound-sign prefixed keys have different meaning and different nested arrays create different functional groups in the form.
This fails both of the LCO tests - most people generating drupal forms never look at the drupal code that actually uses them (I took a couple of looks and while it's nice, modular code, it's also very far away from what we're generating) - and since there's no strong typing it's impossible to know what went wrong when your form doesn't show up correctly (this may have been fixed with additional error checking in newer releases.) Secondly with huge forms with dozens of items, it's hard to look at the data that you're generating and say with any certainty whether or not there's a mistake. Lastly, let's say I want to add a special widget to a form (like a slider for example) - I wouldn't even know where to start since the data that I'm passing into the form just gets magically transformed into HTML output on the other end, I don't have any control over it (other than just putting HTML directly into the form).
Because of the lack of meta-programming at the class level PHP code in general can suffer from a the downsides of lots of complicated LCO's, let's take the Cake PHP framework. From their tutorial, here's an example of a model:
class User extends AppModel {
var $name = 'User';
var $validate = array(
'login' => 'alphaNumeric',
'email' => 'email',
'born' => 'date'
);
}
Compare this to an example in Rails:
class User < DomainModel
table_name 'users'
validates_as_alphanum :login
validates_as_email :email
validates_date :born
end
Wait you say - validates_as_alphanum, validates_as_email and validates_date don't exist in Rails - except that they do in the super class:
class DomainModel < ActiveRecord::Base
def self.validates_as_alphanum(field,options={})
validates_format_of field, options.merge({ :with => ..REG_EXP..})
end
...
end
The same thing is doable in CakePHP, but you end up adding instance methods instead of being able to meta program at the class level since the use of a data structure instead of a method forces the implementation to rely on conditional branching and dispatching instead of letting the language handle that part itself. The advantage of using code instead of a LCO in this case is that there's a lot more help from the language compiler/interpreter than when you try to create a Domain Specific Language solely out of data. You effectively need to write an interpreter for the DSL while on the other hand building it out of meta-constructs allows you to use the development language itself as the interpreter (in which case, you'll probably end up fullfilling some variation on Greenspun's Tenth Rule [via Proggit]).
So, in conclusion: LCO's are great where you need to quickly create data types to be consumed just as quickly, but can become a drag on a project when they get too complicated or are used as complicated interfaces as they don't self-document and can make it difficult to track down bugs when they become overly complex. Of course given the apparent rise of Schema-free databases (Like CouchDB ) and the NoSQL movement, I might soon be in the minority arguing against the limiting overuse of LCO's.
---------
Footnote:
As an aside, in the development of Webiva (Our newly-released, newly-open-sourced Rails CMS) we came across the problem that CMS's require a boatload of customizable features so that modules can be developed effectively and generally. This customization must be easy to add into the system from a developer perspective and easier to extend later on as more options are needed. The former cried out for the support and validation offered by ActiveRecord while the later made more sense using an LCO - after all who wants to update a bunch of models or the database every time a new option is added to the thousands of existing ones.
I started out using just Hash's as that's what comes in via the params object with some custom-inline validation. But that quickly became painful and repetitious, so finally what we ended up with was the HashModel - a hybrid between using an LCO and using a full model that allows easy standard ActiveRecord usage in forms but is easy to create, update and use and store in generic DB text fields. Here's an made-up example usage:
class BlogDisplayOptions < HashModel
attributes :blog_id => nil, :per_page => 20, :category => '', :active=>true
integer_options :per_page
boolean_options :active
def blog
@blog ||= Blog.find_by_id(self.blog_id)
end
def validate
# we need a valid blog
self.errors.add(:blog_id,:invalid) unless self.blog
end
end
Usage - storing values as a hash in a serialized column:
@options = BlogDisplayOptions.new(params[:blog])
if @options.valid?
paragraph.data = @options.to_hash
paragraph.save
end
Or:
@options = BlogDisplayOptions.new(paragraph.data)
if(@options.blog)
...Do something..
end
Switching from Hash's to HashModel's was a huge win in reusability and simplicity. Now I just need to fix all the other places in the system where I ignored the two rules from above.
Nothing. Done, shortest blog post ever.
Ok - but let's step back, I like BDD (Behavior Driven Development ) a lot and the benefits at the end of the day are definitely there to see, but I have noticed a shift in how I develop now that I'm focusing on BDD - while I used to take a very high level attack on problems, the use of BDD is somewhat subversively shifting me to a bottom up approach instead of top down. Instead of adding functionality horizontally on a whole bunch of different parts of project, I end up working vertically and locally on individual classes because it's easier to move forward from a testing perspective.
While that might seem normal or desired on larger shops (where developers are assigned smaller pieces) and fully spec'd out projects - we usually count pretty heavily on the feedback loop that develops with clients by getting usable prototypes out as early as possible. Working vertically on individual pieces of a project instead of horizontally across the breadth of the project makes it harder to have a working prototype to show at any given time, so the shift that's happening because of BDD isn't necessarily beneficial.
Occasionally I need to make sure to remember to look for the forest through the trees instead of overworking certain interfaces. I end up adding more functionality than needed at a certain stage of the project because BDD makes it both incredibly easy to add additional functionality to interfaces that you are already testing and gives you a warm fuzzy feeling with each additional test that passes and will now proceed to sit in a state permanent watchfulness over the correctness of your code.
Testing of new Controllers, Models, etc classes (In Rails case) that require a fair amount of bootstrapping and tear down state around them - whether it be with Mocks or filters or whatever end up getting put off while others that are easier to test effectively get further along earlier in the process even if the classes themselves aren't that complicated.
There's no easier solution around this - tests are always going to add some additional friction in creating new classes that need to be unit tested separately. I've found that making note of the issue and stubbing out a bunch of tests across a couple different pieces of the project before writing any code - don't even write the actual test right away (In RSpec - the it "should .." without the actual test block acts as a nice stub) makes it easier from a mental standpoint to jump around to different classes and unit tests as the code gets developed.
Further I think there is real value in writing in some test abstractions in the form of helper methods even if you end up pulling your tests to a slightly higher level as long as what you're testing is still clear from test code. RSpec helps with this be making it easy to write additional matchers that you can then use very naturally throughout the rest of your code without a lot of ceremony.
In the process of updating Webiva to the newest version of Rails, on the obstacles we had to overcome was adding in support for CSRF protection throughout the system. This is an essential protection for any system but an absolute must for a system like an open-source CMS where anyone can study the code and make an educated guess about what users and content to attack (user id #1 for example)
Working with standard forms in Webiva code base proved to be not that difficult as the form_tag function automatically attaches the required CSRF token, however dealing with hand-coded Ajax calls in the prototype library and seemed like it was going to be a major pain, with the worst case scenario being manually attaching a authenticity_token= parameter to each request (there were probably a couple hundred of them). Luckily there was an easy workaround. We ended up just adding the following code to the top of each layout:
<script> var AUTH_TOKEN = "<%= form_authenticity_token.to_s %>";</script>
And then just added the following to the bottom of application.js:
try { if (!AUTH_TOKEN) AUTH_TOKEN = 'DummyToken'; }
catch (e) { AUTH_TOKEN = 'DummyToken'; }
Object.toQueryString = function (object) {
var result = $H(object).toQueryString();
if (!result.include ("authenticity_token"))
{
result += "&authenticity_token=" + encodeURIComponent (AUTH_TOKEN);
}
return result;
};
Since Prototype calls Object.toQueryString on any parameters passed to Ajax.Request or Ajax.Update the authenticity token should be added in automatically. For testing purposes I added in the try / catch block just to make sure something is set so that if AUTH_TOKEN isn't set we get a server error that can be tracked more easily than just a javascript error on the client side.
Comments in your code are bad. There - I went ahead and said it. If you need to comment the majority of your code you're doing something wrong.
Now, before I get attacked by an angry lynch mob of pitchfork wielding programmers who have just spent the last week digging through a uncommented pile of junk code let me put in a couple of caveats and go over a little history. First of all, I'm talking about code along the lines of something written for Web development - higher level stuff that's usually relatively straightforward (or should be). I'm not talking about device driver code in the Linux Kernel.
Now jump back 15 years ago - When CGI on the Web was just starting to hit it's stride, and there was a good chance that you were writing in C or C++, and going through the occasional loop construct:
for(k=0;k<<len;k++) *ptr++ = (int) *ptr2++ * 100;
I've seen a lot of code written that way, and short variable names, pointer arithmetic and i,j,k iterators were pretty much par for the course.
These days (and in a different language), I'd write the above something like the following:
prices_in_cents = prices.map { |price| (price * 100).to_i )
Both of these snippets (on in C one in Ruby) do more or less the same thing. I would probably want a comment before the first one, but an extra comment in front of the second one just takes up space.
The newest round of languages have become much more expressive and thus can be much more compact, and so you can do a lot more in fewer lines of code and still do it more clearly. This and a better set of tools (e.g. autocomplete) has sort of made it ok for developers to start using longer and more expressive variable and method names - if you only need to type a variable a couple of times and your IDE types most of the word for you anyway, a 40 character method name is much_easier_to_understand (vs. _metu).
All this has gotten me to the point where most of the time, I would really rather the developer spend whatever time they would have spent on the comments on improving the readability of or refactoring the actual code.
When written with an eye towards correctly using the expressiveness of the language and keeping methods small and to the point, I like code better than comments, especially as many programmers don't write expressive high level comments but rather just end up writing some pseudo-code to comment their own code. For the C snippet above, I'd love to see a comment like:
/* Create a list of prices in cents (rather than dollars) */
but chances are, the comment will be something like:
/* Go over each element of ptr2, multiply it by 100 and put it in ptr */
That second one doesn't tell me anything the code doesn't - assuming you understand the programming language you're reading - so there's not a huge advantage to having that comment there anyway.
A couple more caveats - this is only for well-written code in a newer language that an average programmer could understand in a glance. If it's anything more complicated than that or you're writing closer to the metal in a lower level language - comment away, but ask yourself first if the code be made clearly with some extra work instead. The next person in is going to primarily be changing the code, not the comments.
Clarifying the code directly circumvents the other major issue with comments, that they can be out of date while the code never can be (or you have bigger issues). This doesn't happen that much, but comments about the specific blocks or lines of functionality are the ones that are most likely to be out of date as bug fixes and modifications are likely to be verified before a developer comes back and spends time to update the comment, and let's be honest, they (and you) will sometimes forget.
Last caveat - I'm talking about code comments here, not comments used to generate documentation (like RDoc or pydoc), those are fantastic ways to generate useful documentation for a project that are much more likely to stay up to date given their proximity to the code they are discussing. Also As these comments are generally at the method and class level, they force the developer to think about the abstraction created by the code as opposed to the code itself and so will tend to be more useful. That being said, I have also seen:
# send an order given and order id and a cart
def send_order(order_id,cart)
So your mileage may vary.
So there you have it - I'd rather see good, compact, refactored code without comments than one-and-done code and some quickly written comments. Not that controversial after all, right?
Update 9/14: Before I get skewered for saying something idiotically obvious (i.e. "Bad comments are bad; film at 11." ) - this post was written in response to an angry email I got after delivering a project a couple months back, with full Rdoc,but, I was told a lack of comments in the code...So yes, many people subscribe to the philosophy of limiting code comments and focusing on refactoring, but at many levels (especially the corporate level), there is still an expectation of a large number of comments in your code.
Update 9/23: Of course, "self-documenting" code can be done wrong as well
Below are ten of the higher-level (not language specific) programming techniques or actions that I've been guilty of and now actively strive to avoid:
The Rails community loves it's DRY (also known as Don't repeat yourself ...oops) - Anytime you have the opportunity to make it so that 1 piece of code controls 1 piece of functionality, it's generally considered a best practice to do it that way.
In most instances this is great, and by following that principle your code will be clearer and more maintainable if every time you have to foo_bar something you can just use the FooBar class or call the foobarize class method. It's often referred to as re-factoring or extracting a piece of code into a separate file or module and then calling that code from the place you extracted it from.
Sometimes in my zeal to not repeat myself, I occasionally end up trying to find patterns and similarities where they don't existing and end up making things worse rather better when, for example, I need to add functionality to just one of the two instances and start adding options and branches to the extracted code (See #10 -Ugly abstraction)
Thus I internally use the term - DODRY (Don't Over Don't Repeat Yourself). The occasional cut-and-paste of a section of code (don't cringe, it'll be ok) is often not the worst thing in the world, especially as functionality in an application is still evolving. I generally try to follow this pattern:
First, don't try to extract functionality because you think you **might** need it somewhere else, do it only once you or a team-member actually needs it. Second, the first time you think you see a pattern between two segments of code that doesn't jump right out as a perfect place to extract functionality, copy and paste the code into the new file and add a comment referencing the other piece of code in both files (See: Rule of Three ). Down the line if that code needs to be reused again, there will now be three examples of it's usage, and the correct extraction pattern for that code should emerge. If it still doesn't, I am perfectly willing to cut-and-paste and add circular reference comment to all three files - but most of the time the Rule of Three seems like a good guideline.
often the piece of code that I wanted to extract to another file or module originally isn't the code that I end up extracting when once I have a better overview of the whole system.
With any codebase there are set of conventions that should be followed For example, CONTANTS_SHOULD_BE_UPPER_CASE, functionsShouldBeCamelCase, _private_variables_should_be_underscored, etc. Making the decision early on as to what conventions to follow will give you a consistent codebase. The big benefit, though it's more than just aesthetic appeal. There are real, solid benefits to having consistent conventions in a project.
Reading, as a mental process is really hard for your eyes - the human eye can only focus on a very small portion of it's entire range to read text, however we are able to pick up a lot more information than just the small piece of text we're focusing on. By giving your entire codebase a constant structure, your giving you're brain a distinct advantage in understanding what's going on in the code in front of you. Similar to syntax highlighting (find me one programmer who isn't a fan) giving your overworked brain additional cues can lead to significant advantages in scanning over code to find bugs and make changes. Two of my favorite languages take this one step further. Ruby (and Rails specifically) pretty much defines one standard set of conventions to use through out your code - classes are UpperCase, methods_are_underscored, etc. Python makes indentation (one of the oldest visual cues programmer use to structure code) a feature of the language - ensuring a consistent usage across all python code.
Programming is hard, that's why they (should) pay us the big bucks. One of the reasons why it's hard is that in order to write any one line of code you need to have a lot in your head at one time. At an given time you need to know:
Technically, you don't need to remember all of the above in your head at any given time, as any of them can be looked up. But every time you need to look something up you are going slow yourself down dramatically, I'd guess by an order of magnitude or more.
Every time you switch to a different project, language or whatever you'll need to swap all those pieces of knowledge out for a different set, and that puts a significant strain on your mental abilities.
When working on a project your short term memory will slowly, over the course of hours or days add to your stock of knowledge about what your working on. As you get more comfortable with these local facts, you will start to program faster and better, getting to the point where everything else melts away and you're in the "zone".
If you are constantly switching between projects, checking your email, or visiting slashdot every 5 minutes. You have no chance to get there as every context switch is going to take away some of that built of short term memory.
Tools like Auto Complete, while helpful, are no substitute. They might help you get up to speed quicker but they also mandate a bottom up development approach to be useful and limiting yourself to a specific programming style just to fit your tools is generally not a great idea.
Following up on #3 above, if every time you run in to a barrier with a task you immediately go to a reference to try to find the answer, chances are you will never build up your internal knowledge to a level where you can develop seamlessly.
Instead of immediately searching for function names in a reference, give yourself a couple of moments to try to think of the answer before going to the book (or online reference). Once you're looking at the reference, think about what you're looking at for a couple of moments to try to put it in perspective and give yourself a crutch to remember it next time. For example if you're looking up a function in a library - is there some convention to the name you can use in the future, is there a consistent parameter order across the entire library, is there a quick every-good-boy-does-fine memory trick you can use put in your brain to remember?
This doesn't mean, of course, that you should be trying to commit to memory every function in every library that you'll ever use. Don't clog your brain with useless info if you're never going to call that method again. But make sure you aren't slowing yourself down by having to look up common stuff.
A companion to reference diving is answer diving - which in my mind means immediately looking for the answer from someone (usually on the web) to every programming situation you come across.
Significantly worse than just answer diving - To Bleg (a term I came across on the freakonomic blog ) is to write a blog post begging for people to solve your problems. I think it's also an apt term for people who post to forums or mailing lists demanding others take the time to solve their problems.
Why is answer diving bad? (The same thing applies to Blegging, you are just annoying other people on top of the other issues) Programming is the skill of being able to create something that solve problems. It's like in high school
- for me the hardest math problems were the ones where you had to come up with a general equation to solve a problem not just come up with the answer. Developing programs that solve problems is a skill, and if every time you are faced with a problem you look up the answer, you aren't going to be advancing as a programmer, you're just going to be advancing as a parrot, repeating other people's solution.
That doesn't mean you should never look up an solution - it just means that you should give it a little bit of time to solve yourself first. One thing that may come out of trying it yourself is understanding where the difficulties in the solution lie. By doing that you will be better prepared to evaluate others solutions from a performance and correctness point of view. Oftentimes it's impossible to understand why something was written the way it was without being faced with the same constraints that the original developer was faced with in the first place. If you spend your life cutting and pasting code snippets that you don't understand you'll never get to the point where you're one posting the code.
Building it to be done means not taking any shortcuts while building a feature for your project. One of the universal truths of programming is that it's the closer you are to when you wrote a piece of code the easier it is to make changes to that code. This means that you shouldn't put off parts of the project you're working on just because you don't feel like working on them. Stuff like options dialogs, configuration variables, etc, should really be done when you are writing the code to begin with, not down the line when you have time.
Not subscribing to this mentality means that your 2 month project will be 99% done for the last two weeks as you remember one thing after another that you forgot to do before you actually deploy.
Of course there are a number of perfectly good reasons not to write a certain piece of code - you might not have the spec's, you might not know what needs to be configurable. All of those are valid reasons, just make sure you're not just making excuses because you don't feel like doing the boring parts.
While people can be mixed on whether or not code is self documenting (I tend to side on the "usually" side of the argument), stuff that you do that doesn't manifest itself in code is definitely NOT self documenting. Stuff like how to set up your build environment (whether it be installed utilities, environment variables that need to be set, or file system permission that need to change) is extremely important to both the next developer down the line and even you a couple months later when you try to set up your laptop.
A basic rule of thumb is, if whatever you're doing is going to need to be done again, it's probably worth documenting. Even just a cut and paste of the "history" command showing how you just deployed to a new server will come in mighty handy a couple of months down the road whenyou have to do it again.
These definitions are probably not universal, but here's my explanation of two different ways of fixing programming problems:
Debug (v) - To trace a programs path of execution back to a known state and then forward to find a divergence from expected behavior.
Flail (v) - To keep trying different sh*t until something works.
Both Debugging and Flailing can use the same tools. You can debug with print statements and you can flail with Visual Studio's line-debugger. The difference is more a question of how you approach the problem rather than the tools you use to do it.
A classic example of flailing is getting an off-by-one error in a loop and to just keep changing your conditions until it seems like the solution is write. It's a lot easier to just make a couple of code changes and let the computer do the thinking than do stop and actually understand exactly what's going on.
In general debugging is involves a much larger context switch as you have to wrap your mind around a much larger piece of code and step back and understand the bigger picture of what you are doing. If you a fixing a smaller self-contained problem you just introduced, a full-stop debugging effort may not be worth the effort. That said, for the 96%** of the other problems you come across, Flailing isn't the answer.
If you are fixing a problem that does not relate to code you wrote recently, going the flailing route will invariably introduce bugs that weren't there before, as you might miss boundery conditions or get something to work for only a subset of you possible inputs.
The other major problem with flailing (and remnents of this are apparent in almost every CSS file that I look at), is that even if you solve the problem, you end up with additional cruft that may not have been needed to reach a solution and you'll never know what it was that you did exactly to make things work.
** - Made up Number
One of the mistakes that I was (and probably still am - but being aware of the problem I think has helped) guilty of was the crime of trying to make one piece of code do too many things. Like a vacuum cleaner with too many useless attachments that don't work correctly and keep getting lost, expanding on one function or piece of code by adding too many different conditional branches leads to a quick and ugly code death. At first glance, adding options might seem like vintage DRY - if you already have code that "almost" performs a certain function, why not add a couple of lines and a paraemter or two to make it do a little bit more. In reality though, a hundred and one conditional branches are the quickest way to take an ugly-stick to your code. The next guy forced to look at the spagetti that you've written will probably just end up quiting so that he doesn't have to maintain what you've written. What I've found to be a better option is to extract the shared functionality separately and then invoke both your old function and a new function to get job done:
What's better:
function generate_table(&$data,$wrap_in_a_div = false) {
$table_info = ... // Generate your table
if($wrap_in_a_div} {
return "<div>" . $table_info . "</div>";
}
else {
return $table_info;
}
Or:
function generate_table(&$data) { $table_info = ... // Generate your table
return $table_info;
}
function generate_table_in_div(&$data) {
return "<div>" . generate_table($data) . "</div>";
}
For my money, the second one is liquid gold compared to the first one.
We all know that today's current incarnation of you the programmer never makes typo's or off by one errors. We can't however, speak for the you of last month or last week. We all live and learn and we all make mistakes. Simply because you know how to develop something correctly now doesn't mean you did it the right way last week, so never take for granted that you didn't screw something up in the past, and definitely don't play the "I would never do that!" jerk programmer card when someone points out a potential problem in your code. Take a look at the code, figure out if it was you or not, and then only after you've done your due diligence take the holier-than-thou-who-screwed-up or must-be-a-hardware-problem route (or, if you find that people seem to not like you very much, you could just calmy point out what and who caused the problem in a gently worded email and everyone will start to be nicer to you).
Large projects furthermore all suffer from one issue whether you acknowledge it or not: incomplete specs.
No matter how good you are at what developing specifications and software, if you're building something you've never built before, 99% of the time your not going to have a perfect design at the moment when development commences.
The Agile methodology is built on this concept, but even if you don't necessarily subscribe to "the whole agile thing" - experience in software development should give some truth to the above paragraph. If not, well you are either lying to yourself, were recently hit over the head with a large object, or you are just that good in which case you can stop reading as the rest doesn't apply to you.
For the rest of us, the reasons we have incomplete specs are pretty varied, here's some examples
- You never wrote complete specs to begin with
- The client doesn't know what they want
- Feedback from early prototypes has led to changes
- The client modifies the requirements
- Different 3rd party libraries have different features and you don't know which to use
And there's plenty more (e.g. the US economy crashes in the worst recession in a century and the client slices the budget in half)
Regardless, should any of these happen, the specs you have are effectively incomplete (even if you don't know it). What this means is that even if you could through all of Google's crack developer staff at your project, you couldn't finish it right away because there are still design decisions to be made.
What this means is that there are certain parts of the project whose development you are going to need to put off for the time being. This can be a problem if other parts of the project need to use that component or if you need to get a quick prototype up that uses features of that component.
You have two options: One, even knowing that the specs for that component might change, go full steam ahead! and get something in there. Or two, figure out a way to defer a making a hard decision on that component while still doing what needs to be done to keep the project going forward.
What's wrong with 1? If you don't have all the information necessary to make the decision, you really shouldn't be making it if there's another way.
Let's say your project needs to incorporate video but you don't know what types yet. If you go ahead and integrate a 3rd party library that only supports quicktime, then you are going to be in a for some painful extraction process down the road when you need to support AVI and OGG down the road. If you went ahead and tightly coupled your code to 1 specific library, that extraction process is going to be even more painful.
You'd like to put off deciding on a video library but if video is a big part of your project, your boss will want to see some video in there sooner rather than later.
Here's where the Adapter Pattern comes into great use. Instead of coupling to a specific video library, ask yourself what do you need a video library to do, maybe all you need is:
Video#new(filename)
Video#stop()
Video#start()
Video#seek(offset_from_start)
Now lets say the video library you are looking at offers the following functions:
XYZVideo#new(read_type,base_bath,file_type,filename,option_A,option_B)
XYZVideo#play(offset)
XYZVideo#pause(remember_location=true)
XYZVideo#reset
XYZVideo#fastForward(time)
XYZVideo#rewind(time)
...And 50 other methods you don't actually need
Creating a wrapper class around the XYZVideo library will give you the simple interface that you need, but you'll be able to quickly integrate video into you project - and your project won't be peppered with implementation-specific details about accessing your video.
Even better, write some unit tests on your Video class, and then in two months when you invariably switch libraries, you'll already have tests written that will let you determine if you're new video library is working the same way as the old one.
Going back to the idea of deferring - if you really don't have the specs for video nailed down, you could even skip the library for now and just output a static frame instead of actually linking up to a real library. This will give the rest of the project time to fill out around the video requirements and any changes to you imagined video interface will be extremely easy to implement.
A couple months down the line - the other code in the project will have defined exactly the interface that you'll need for you video library and you'll be able to drop in the perfect 3rd party library that fits your need and budget.
Two caveats to this:
using an adapter pattern has some downsides. First, it's more project-specific code to maintain and secondly it's more code that gets run during execution so it can slow your project down some. Written correctly, the first problem is not a big deal as it reduces coupling between components and third party libraries, so the effect of a 3rd party library with different conventions and coding styles will be minimized and your code base should look more uniform for the most part. The performance issue is also not usually a big deal, except when you're using an adapter pattern at too fine grain a detail - if a co-worker comes in and says - What if we will want a different for(;;) loop implemention later in the project? Easiest solution: throw something (light) at them. Used correctly, most of the time the total performance hit is a couple of extra stack pushs and jumps, but performance issues are something to keep in mind.
Second caveat, make sure you aren't overusing the Adapter pattern for the wrong reasons. If your project is called KSuperApplication, it's probably ok to use the KDE libraries as that's part of the point. Also you shouldn't really be using Adapter's to adapt your own code base to your own code base. If that's the case you probably have some bigger architectures issues you need to work out (this is sometimes necessary with legacy code)
Bottom line, if some parts of a project are unclear, put them off. It's better to put off writing code if you can when you don't know what you want to write. This isn't always possible, but sometimes you can get away with deferring decisions by focusing on the features you do know that you need and generating a stub or an adapter class to expose just those features and hide the rest of the details from the rest of the code.
This has led me (and I imagine many other small business owners) to become somewhat obsessed with personal efficiency - figuring out how to extract the most out of each day, while keeping ourselves mostly sane, fit and fed.
Developing a complicated piece of software make the issue even more severe as I've found that my level of productivity is directly related to the amount of mental inertia I have on the current project:
Hour 2 working on a complicated project is considerably more efficient than hour 1. Day 2 of straight interrupted work on a complicated project is a significant multiple more efficient. Day 3 approaches that magical level of efficiency that I've heard described as "the zone" - that period where everything else melts away and your hands magically emit gorgeous code haiku's - perfectly sized, efficiently abstracted pieces of DRY goodness.
Now the time it takes to get in the zone (3 days in the example above), can vary widely depending on the level of mental focus, familiarity with the development environment and the size and scope of the project. Small projects that are easier to fit your head around are likewise easier on your mental processes.
What does this have to do with small web consultants?
If your experience is anything like our own, then the graph below is probably in line with your experience:

The longer you do this, the more complicated the projects you will be taking on will be. This isn't necessarily a bad thing, as the more complicated projects are also generally more interesting both intellectually and financially.
However, unless your are an abject failure at your work and none or your previous clients ever call, the graph below will also be relevant:

Taken together, what do those two graphs mean? Firstly, that as your new work gets continually more complicated, you'll have less and less time without new interruptions. Meaning less and less time in the zone, meaning that complicated web app your could have built in 1 month of solid 'zoned-in' work will end up taking you half a year due to the constant interruptions and bug fixes for your previous work. The client won't be happy with the adjusted timeline and neither will your pocketbook.
This isn't meant to be a "Magic Bullet" blog post - I'm not sure what the ideal solution for this problem is - we've done a number of different things to try to mitigate this problem.
First - For significant-sized projects standardize on one platform you are happy with for the majority of your client work as early on as you can [ We picked Rails, and then Webiva ]. By building off of a standard platform your minimize the cost of the zone-destroying context switches as you move from PHP to .NET to Ruby to Java. Keep the experimentation with different languages and frameworks to smaller projects and your personal time.
Secondly - the best code you can use is the one you didn't have to write. There are lots of open source options out there for both libraries and higher-level functionality. Make sure the license fits with what your doing (and the client's expectations), otherwise be prepared to release your code to your client under a compatible license (or in the case of the AGPL - be prepared to release your code to everyone under a similar license) If it fits your needs and looks like someone else is going to be actively maintaining it - it's a great alternative to writing a bunch of code yourself.
Thirdly - adjust your schedule to maximize the hours where you can work without interruption. We've taken to getting about around 5:00 AM and not responding to client emails until 9:30. With the right morning schedule I can get a number of focused hours in before the first crisis arises.
Lastly - as soon as you are to the point where you have a little bit of cash in savings and don't have to worry about making rent and putting food on the table, get as picky as you can about the projects you take on. Make sure the client seems compatible with your style of work and that the project itself interests you. If it doesn't, your going to have a hard reigning in the mental focus to keep yourself from scouring reddit and slashdot every 5 minutes.
The above four strategies kept us going and growing to the point where now we're headed into the start up world and are focusing primarily on internal projects. We still do consulting projects, but they are primarily with our existing clients, which keeps the surprises to a minimum. I don't know if there's a better solution other than growing the size of your staff and/or accepting a drop-off in quality or quality of life, if anyone has any ideas I've love to hear them.
The truth of course is that no one person did invent everything that went into creating an automatic weapon. Mankind's need to kill things as efficiently as possible worked over time as incredibly strong market force pushing invention and innovation forward in the field. The reason that a fully automatic weapon seemed so amazing to me is that it most likely incorporated a couple dozen or more "Aha!" moments by the greatest thinkers in the field, combined with a significant amount of engineering and trial and error to get everything to work just so (and most likely more than a couple of exploded barrels). But the fact of the matter is that no amount of rote engineering would get you there. You need individuals with the right amount of knowledge and experience applying themselves to difficult problems to get innovate solutions that advance the field.
Because of the term "Software Engineering" - many people often assume that software development is simply the application of a set of principles and code patterns to a problem to come up with a working solution, like doing the calculations to build a road. In a nutshell, a common belief is that the smart, higher paid software architects generate some UML diagrams and then pass those off to the code monkeys to bash their fingers against the keyboard and output the necessary code. Throw in some unit and integration testing at the end and there you have it - freshly engineered software.
I'd love to say that belief and method of operating is a lie, but it's not. It's definitely possible to develop software this way and lots of company's do it. Lots of them also fail horribly There is a reason that despite the proliferation of code generation tools and the buzz about off-shoring there is a still a very strong market for us code monkeys.
As it turns out - implementation matters, and it can matter a lot. In cases where software is going to grow, change hands and live a life of it's own after the first deployment there are significant benefits to having well written code: it's easier to understand what's happening, it's easier to change, it's easier not to completely bork some other system down the line. Something designed by the world's greatest "Software Architect" , providing UML diagrams that look like works of art, can still quite easily end up being a dud when it's implemented.
Two carpenters each building a cabinet based on the same design can end up with products of widely varying quality. While at the most basic level (sure, they both hold dishes), it's quite possible one cabinet might feel like a rock-solid heirloom piece of furniture and one that seems like it might be ready to fly apart from a cross-breeze.
Anyone who's been in the software industry for a couple years has most likely seen the same thing. One piece of custom software in the company is rock solid, it's easy to add bits to and take parts away from without worrying about the whole thing falling to pieces. Other custom software is less stable, however, and programmers try to whisper magical incantations over it in the hope that the two line change they just made won't end up frying the mainframe.
To cast a sad aspersion against the software industry - I'd bet that in down times good programmers are more likely to get laid off than bad ones simply because the bad ones have such left such a destructive wake of spaghetti code behind them that no one but them knows what to do with it. Think about that - the worse you are the better your job security. What other industry has that depressing feature?
To go back to the cabinets from a few of paragraphs previous - the reason we call finish carpentry a craft and carpenters craftsmen is because the good ones, those that have the knowledge, experience and commitment to do good work do it a heck of a lot better than those who don't. Software developers, at least those who live, eat and breathe software development are the same - by some accounts well over an order of magnitude better than their less skillful and engaged peers.
Of course on the flip side, you don't always need a craftsman - if you run a cheap chair factory the last thing you want is someone who cares too much about what's coming off of the assembly line.
The software your company is developing might be the same thing, and it might just not be worth spending the money to do it right. A couple of off-shore programmers fed with a few pages of UML diagrams might do just the trick, and that's fine. Just like IKEA is making buckets of money selling cabinets that aren't heirlooms, your company can probably do the same with some barely passable software and save a few bucks. A word of warning though - when that project is months behind schedule and looks like it might not get done because things that worked last week are no longer working, that savings might not look quite as good.
The temptation from the MBA set is to believe that all it takes is itheir great idea and a quick-and-dirty implementation by the lowest bidder and the $$$ will start rolling in. The reality is that no one gets it right the first time, and if, when it's time to implement all those small tweaks that take the great idea and turn it into a great product, your codebase is falling apart - the value of having paid for or not having paid for a solid implementation will be pretty clear.
(however, for full disclosure, my mother tells me that she and my father were once called to school and I was threatened with suspension for engaging in hazing. I believe was 6 years old and In the second grade. I don't actually remember what I did but apparently it constituted hazing)
Now the liberal-elite, gotcha mainstream media are quick discount hazing as nothing but a boys (or girls)-will-be-boys ritual directed at humiliating individuals simply for the sake of feeding the ego of perpetrators. While it can certainly sometimes fall to the level, as Dr. Cialdini discusses in his book Influence, hazing's raison d'etre is much deeper than simple cruelty. In the chapter "commitment is the key", he explains:
The Thonga tribesman with tears in his eyes, watching his 10-year-old son tremble though a night on the cold ground of the "yard of mysteries" ... these are not acts of sadism. They are acts of group survival. They function, oddly enough, to spur future society members to find the group more attractive and worthwhile. As long as it is the case that people like and believe in what they have struggled to get, these groups will continue to arrange effortful and troublesome initiation rites. The loyalty and dedication of those who emerge will increase to a great degree the chances of group cohesiveness and survival (Influence, Robert B. Cialdini, 78)
One of the quirks of human nature is that we value things based on our level of commitement. The more effort and energy we expend to achieve something, the more we will value the result. The same goes for membership into organizations - groups that are difficult to become a member of will invariably be valued by their members more than groups that allow anyone in, even if they provide the same or similar services.
There is a corresponding desire, having fought tooth and nail to gain entrance into a group or community to play "gatekeeper" and ensure that anyone seeking admittance to the group has to go through the same trials that you did, otherwise your own sacrifice will have been worthless.
Software developers, I believe, can sometimes put barriers around their software for those two reasons. The question is whether we doing it for "good reasons" (because it generates a stronger commitment and devotion) or "bad reasons" (because if we had to go through it, you darn well do too), or a combination of both?
For an example of the former, take the VI editor - there is a lot that could be done to make the editor more user friendly - but something would definitely be lost regarding the exuberance of it's zealous fan base if it could be mastered in a day of puttering around (For those of you so inclined, take the word VI and replace it with Emacs and you get the same result ). I don't think that sort of exclusiveness is necessarily a bad thing as by creating a more excited user base, everyone benefits. A motivated user base that actually pushes the product and adoption of the product forward is a great thing.
For the second example, any time you ask a developer a question they know the answer to and they respond with "look at the code" - or pulling from previous personal experience - walk over to your desk with a 500 page ISO specification document and tell you to look it up - I don't think anyone is benefiting. As individuals a little bit of a helping hand getting started goes a long way towards making progress on just about any task.
A couple of years ago, having used a small open source utility I sent a note to the author asking if they needed help fixing a bug I had discovered. Their response - "sure, just send us a patch" - while perfectly valid, seemed to have some of the gatekeeper mentality. "Prove yourself, and then we'll talk." Having no knowledge of the codebase and no one to help me get started, I got discouraged after a couple of hours of debugging and used something else. No one ended up benefiting. A two line response of - "Sure, check fizzbuzz() in foobar.c" would probably have been enough.
So, one of the goals we have with launching Webiva is to try to make it a friendly piece of software to get into - whether you're a seasoned programming veteran or just a user who wants help. I'm not saying it's going to be easy as there is some significant complication innate to the software, but we'll be there to offer a helping hand for anyone interested into getting into it. And, we promise, no livestock.