Bash Scripting Tips


One of the things I’ve been excited about playing with since I’ve changed my home OS to Ubuntu is Bash scripting.

The lack of any developer-friendly terminal built into a fresh install of Windows has been a painful experience. Luckily, I’ve heard that Powershell 5.0 is an included application in Windows 10 (finally). So while this post centers around Bash, do expect some future tutorials on Powershell. I see that becoming much more popular now that it’s there by default. And rightfully so, scripting an OS is a pretty powerful feature for super-users.

Why is Scripting Important?

In any technical field, our main job is to automate as many tedious tasks as possible. Consider Excel, or Quickbooks – they don’t do anything we can’t do ourselves. But they do automate a lot of the work, and theoretically make it less error prone. As far as an OS is concerned, there are several tasks that fall under this category. Basic manipulation of files, running programs, scanning over output; these are all tedious tasks – and as such it can stand to be automated. But like with anything, you must be careful not to overdo it.

The purpose of this article isn’t to debate the importance of scripting an OS, or when you should use a particular tool. Originally I had intended this, but it’d be a more constructive use of our time if I just went over best practices and cool snippets of code. If you really want an answer for this I’ll give you a basic rule of thumb that pretty much covers most computer engineering questions if you abstract it enough.

Q: When should I write a Bash script?

A: When it is absolutely necessary.

Best Practices

Know the Basics of Bash

I know, I know – seems pretty obvious, doesn’t it?

I can’t tell you how many people come into a new problem without any understanding of the tools they’re working with. This generally leads to confusion, confusion leads to annoyance, annoyance leads to hatred of the technology. I can recall several instances of this at DigiPen; people getting used to one way of doing something an ostracizing any deviants from that structure. A good rule of thumb here is if you hate something, (and if you have the time) give it a shot for a few weeks and see if it grows on you.

My reason for saying all this is because Bash has a pretty odd syntax. Especially if you come from a programming background which is based in more modern languages, you can get confused real quick. For example, a C programmer might look at the following snippet and get very confused:

There are slight differences between the styles presented, but they have generally the same output given the above input $path (only two lines should ever print, either both saying the path exists, or both saying the path doesn’t exist). If you don’t understand something in the above script, you need to learn at least the basics. You’re not doing yourself any favors if you only half understand the above code.

Another example of something you should understand very well is how variables work in Bash. Namely the difference between $* and $@, and what putting them in quotes will produce.

Generally you will want to quote your argument expansions. But you should know why.

Agnostic Parameter Placement

Agnostic means without knowledge. In general you should subscribe to the notion that most of the time a shell function should have parameters passed to it without knowledge of the order in which they were intended be passed. Though this is usually the case, sometimes it’s okay to make assumptions that arguments should be passed in a particular order – in those cases the function should provide ample fact checking and fail if parameters are passed incorrectly.

This should be taken into consideration when you go to write a function or an executable file. Don’t think of it as a function in C. Think of it as a function in Bash. Functions in C are strict, they follow a declaration. Functions in Bash are loose (purposely). And you should always try as best as you can to follow this loose-style of use when scripting in Bash.

Consider the difference between foo --bar file.in and foo file.in --bar. If you wrote the function foo, would it handle both cases and produce the same output? The next question you should ask is “Does it make sense for the parameter placement to be agnostic?” If foo’s functionality was to process files in the order in which they are received, with only the flags prior to each file (in position) being in affect, maybe the difference might make sense. In which case, I would recommend printing a warning that the flag --bar is unused, since no files follow --bar.

My general recommendation is to (almost always) be agnostic to the placement of arguments in your function calls. The easiest way to achieve this is to wrap a case-statement in a for-loop.

An example of when you may want to consider the placement of arguments important is in command-driven logic (such as git). For example, git add file.in should probably not work the same as git file.in add. So as mentioned, this is case-by-case, but generally you should approach a new function with the idea that it should not care about the placement of arguments.

Sourcing Files

If your bash script is getting fairly complex, it might be worthwhile to separate it into several Bash scripts, and have one main caller script. Similar to what you might do with other languages, Bash can “include” other scripts – kind of.

There is a function source which will try to run a file relative the the user’s current working directory within the current bash instance – thereby populating the current instance with the variables and functions defined within the file. The problem with this, as you might be aware, is that it attempts to include from the current working directory – which may not be what you have intended (it usually isn’t).

One way to work around this is to prefix your source’d file with the directory from the top value in BASH_SOURCE, which is an array of all the files that have been sourced (current file on top).

This is pretty disgusting. I actively detest this workaround – though I see it as a necessary one.

What I do instead is define a function named include, which attempts to find a file by iterating through all BASH_SOURCE, failing and throwing an error if no file is found. I then export that function to allow all sub-shells and recursive calls to bash to also have this function defined. I have simplified the function for the purpose of demonstration, but you can imagine other checks I might perform or functionality I might add. I have kept in things I consider to be important: Handling files/directories, checking iteratively through BASH_SOURCE, expanding a user-controlled INCLUDE variable, keeping track of which files we’ve already included.

Check Errors and Fail Fast

Every call can fail.

Error checking doesn’t have to be ugly.

Since you’re generally manipulating user information (paths, input, etc.), you can easily find yourself in a bad state. If you don’t kill the script, you run to completion with that bad state. This doesn’t sound terrible until we consider what is running in a bad state: A script which manipulates the OS. My general rule of thumb is to bucket the kinds of failures, and consider whether or not you should handle the failures based on the bucket they fall within.

  1. A System Call (rm, mkdir, touch, >, >>)
    You pretty much always want to check system calls. Anything that manipulates directories, files, or the system in general should be error checked as if you’re life depends on it. Many poorly error-checked programs end up doing something simple like so rm -rf "$argument/", only to find that at runtime $argument is never set and expands to rm -rf "/", which we all know is pretty terrible. Always, always, always over-error check these kinds of calls.
  2. Logic and Workflow
    Depending on the type of logic, you may wish to check the information passed into it. This depends on what the user is looking at. If it’s user files like directories or files, it’s probably worth-while to check that information. If the logic doesn’t have any serious side-affects, and it’s costly to effectively check, maybe don’t do it. Generally this is a “do it as you see fit, but bias towards checking.”
  3. Echoing Information
    You generally don’t error check echoing, it makes sense not to. But you may wish to error check information prior to it being echo’d. Otherwise it won’t be a huge detriment to the user (no data was manipulated), but it may produce unexpected results for you and those depending on the output. Generally I check prior to output, and since there’s no way to really check the output, I output the information. You should usually check as much as possible prior printing anything though.

I will commonly define a function named panic or die, and use this function as a way of printing an error message and exiting when something goes wrong. It helps me know where something is failing, as well as stops the script before it can cause any real damage.

Below is an example of some light error checking done on a script which prints remote and branch information for a git repo. I would say this script could definitely stand more checking, but the checking that’s there isn’t terrible. I just want to show that it doesn’t have to be ugly.

This is actually a pretty handy script, here is example output:

From this I can see that there have been 32 commits, no tags, the commit hash, how many commits I have to push to origin/master (1 commit), and origin’s SSH URL.

Write Pretty Code, Write Sparse Code, Write –help Code

Honestly, I could say this in any article about programming.

It really doesn’t take that much time to format code as you go. What does take time is reformatting after the fact. Some languages are a little easier to be messy about, and unfortunately Bash is one of them. Code should always be easy to follow for someone who is reasonably aware of the language, and comments should be used to describe functions and blocks of code. Whitespace is your friend, and newlines don’t cost anything (maybe an extra tiny bit of power to NOP them while Bash is parsing – worth it). Generally most things in programming are give-and-take.

Can you afford to have your code run an unnoticeable amount longer in order for the code to look prettier? Probably.

Hand-in-hand with this statement is the fact that you should opt to write no code than write any code at all. Scripting is a dangerous area to be in because you run the risk of over-automating. You’ll spend more time on your scripts than your job. I’ve absolutely been on this end of the spectrum, and it’s not worth it. Half the time you forget that you made a script to help you with something, and the work you put into simplifying and already simple task just goes to waste.

Can you sometimes afford to manually do things, or not re-invent the wheel by making a custom helper script? Probably.

Since you will often leave a script sitting around for a long period of time, you should pretty much always expect that you will forget what it does or how to use it. If it’s intended to be called from a terminal, you should provide the mechanism for –help or -h flag parsing, and in those cases when –help or -h are provided anywhere (agnostic to placement), the function should print help text, and then promptly exit 0.

I’m not going to give you a one-liner about writing –help documentation, you should just do it and future you will thank present you. Probably.

 

Get Scripting!

So this post is based around Bash, but as I mentioned – Powershell is now installed by default on Windows 10. ;)

Regardless of what OS you’re on, you should know how to create minimal scripts which can take care of a ton of silly, remedial tasks. On older Windows installations this was difficult due to the crutch that is Batch scripting, but on any *nix hardware, or more modern Windows 10, you should have ample power with the default scripting environments provided.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.