Qt5 OpenGL Part 3a : Efficient Input Managers 1


Whenever I start working with a new framework, I run into problems right away with the framework’s key enumerations. It wouldn’t be a problem if all I needed to track is when a button was pressed and released. But many times in rich media you need a little more information that requires some kind of caching of previous values.

It is very common to over-think this part of the implementation. And in fact it’s something that I’ve just been curious about for a long time. So it’s time to put these thoughts to rest!

Note: This section will NOT teach us anything in terms of Qt5 or QOpenGL*, so if you want to skip it – you wont miss anything except for my ramblings.

Traditionally, input managers can be implemented in many ways. On certain hardware the implementation is obvious. But with computers – it’s not so simple. We have a ton of potential keys and modifiers to support. It boils down to three options in the end:

  1. Support every key using the frameworks default enumerations.
  2. Support every key by translating to our own, normalized enumeration.
  3. Support only the keys we know we will need.

The Qt::Key Normalizer Interface

My initial implementation primarily lead me towards using Qt’s QMetaEnum class to generate code which will normalize keys within a range of [0-N], where N is the total number of possible keys. However, it’s a little tricky to implement if you’ve never done code generation before – and I ended up wondering; did it really saved me anything?

If it didn’t save me anything, that would be awesome. It’d mean I’ve just been over-thinking the problem for years, and a simpler solution would have been the best – this is most likely the case. But I still felt the need to code it up and test it.

Would a switch-case really be slower than a dynamic approach?

Our KeyConverter would have an interface like this:

DO NOT IMPLEMENT THIS!

As mentioned in the comment within the header of this file, this would be a file that one project would create, and another project would use to compile. We will come back to this idea – just understand that convertKey() and revertKey() take a key and convert it to a different, continuous enumeration. (One which can be used for constant-time array access).

The Input Interface

The next class we have to deal with is a class which maintains the state of the keys. The reason we need to hold this state is so that we can tell if a key is being held or not. KeyPressed and KeyReleased events are great for detecting when a key was just pressed, or just released – but most of our logic revolves around whether a key is being held or not.

The Input interface will look something like this:

Again, we’re not implementing this yet – we’re just discussing the implementation details of Input.

My question was never about the interface – the interface isn’t hurting anyone. It’s the implementation that we need to worry about. Is it better to normalize Qt::Key’s value and store it in an array? Can that be done efficiently in a switch-case? Can we store state information in a container of some sort? Which container would be best?

Well, to answer these questions I decided to write an application that would run on several different types of containers, with different implementations, and time the results.

The Tests

In the end, I settled on 10 different configurations – mostly differing in containers.

  • KeyConverted (contiguous array)
    • This test normalizes the Qt::Key that the user passes in and performs a constant-time lookup to see key states.
    • Updating states involves a single memcpy.
  • KeyConvertedCustom (contiguous array)
    • This is the KeyConverted test, with a KeyConverter that doesn’t cover every Qt::Key. Instead Qt::Key’s are hand-picked and reduced from ~400 to ~150.
    • Updating states involves a single memcpy.
  • KeyLimited (contiguous array)
    • This test does not normalize the Qt::Key, but instead keeps a range of acceptable key values. This means all we do is the constant-time accessing for key state.
    • Updating states involve a simple memcpy.
  • QKeyList (QList)
    • A dynamic implementation that stores all “active keys” in a QList container.
    • Updating the states involves iterating the container.
  • QKeyMap (QMap)
    • A dynamic implementation that stores all keys that have ever been activated in a QMap container with their state.
    • Updating the states involves iterating the container.
  • QKeyVector (QVector)
    • A dynamic implementation that stores all “active keys” in a QVector container.
    • Updating the states involves iterating the container.
  • StdKeyList (std::list)
    • A dynamic implementation that stores all “active keys” in a std::list container.
    • Updating the states involves iterating the container.
  • StdKeyMap (std::map)
    • A dynamic implementation that stores all keys that have ever been activated in a std::map container with their state.
    • Updating the states involves iterating the container.
  • StdKeyUnorderedMap (std::unordered_map)
    • A dynamic implementation that stores all keys that have ever been activated in a std::unordered_map container with their state.
    • Updating the states involves iterating the container.
  • StdKeyVector (std::vector)
    • A dynamic implementation that stores all “active keys” in a std::vector container.
    • Updating the states involves iterating the container.

The test that is run is the same for all of them. An inner for-loop runs N executions of the functions we want to isolate. The inner loop is timed, and that loop is executed K times – the time in milliseconds that it took to run the functions N times is aggregated. After running the loop K times, the average is taken by dividing by K.

There is one dependent variable that affects some of the algorithm’s runtime – the number of keys the user is currently pressing. I call these the “Active Keys”, and about the only algorithm they don’t affect is the first two involving the KeyConverters. So we will capture data where KeysActive = { 0, 1, …, 9 }.

I really only care to track two different times; Query-time, and Update-time.

  • Query time is the amount of time it takes on average to query a single key’s state. (Triggered, Pressed, Released)
    • This one is a bit tricky, because I have to request a key. What I decided to do was create a helper class which would return a random key, and seed it the same across all tests, so that each test would receive the same random keys in the same order – but each run had a different initial starting seed of time, so the randomization per execution is different.
  • Update time is the amount of time it takes on average to update the state information of all active keys.

Results

One thing to mention about the results is that there were a few outliers that took much more time to compute. Most all of these were map-types, which usually took at least twice as long as our other functions. Since we have no need for such outliers, I removed them from the benchmark where they skewed results (as you can see, StdKeyUnorderedMap was fine for iterating over small amounts of elements). It seems that std::list and std::vector outperformed everything, and on smaller sets of data std::list pretty consistently outperformed std::vector in both iteration and access. This seems counter-intuitive to what I’d expect, but I imagine it’s just a slight implementation detail, and that the possible cache coherency issues that could arise with a list just wont make enough of a dent until we start getting more modest sets of data (~10+ elements).

Below you can find the benchmarks for querying to check if a key was pressed (lower numbers are better):

Calling keyTriggered(), keyPressed(), and keyReleased()

Calling keyTriggered(), keyPressed(), and keyReleased()

As we can see, when a small amount of keys is pressed, really there are only three contenders; KeyLimitedRange, StdKeyList, and StdKeyVector.

Here are the results filtered only the the three most-efficient (lower numbers are better):

Same results, filtered to the fastest 3 Instances.

Same results, filtered to the fastest 3 Instances.

Shockingly, StdKeyList and StdKeyVector are both faster than KeyLimitedRange on query. I had assumed that the constant-time lookup would be faster – even in the case of 0 Active Keys. So I went back into the code and made a few small changes to try to speed up KeyLimitedRange. I removed a boolean check by putting more work into the update function (which is reasonable since we will have far more calls to query than to update). I produced KeyLimitedRangeSimple, complete with one less boolean check per query, but the results were pretty similar. Not to mention it blows the call to update out of the water – not the best patch to make.

However, it seems that std::list only shows its power with several millions of accesses, because at a lower sample size, std::vector is about equivalent. We can see that in about the same amount of time for KeyLimitedRange to prove more efficient, std::list proves itself less efficient than std::vector. Looking at the data as it trends towards larger sample sizes shows how the data will trend.

How the data trends towards larger sample sizes.

Note: It is possible to build a Vector-based Input Manager that has O(logN) query time, and O(N) update-time. To hint at what will grant us this speed-up, std::vector was unordered in this test, but since we have to iterate std::vector anyways in update(), we can order our container, allowing O(logN) access times. But be careful, the additive constant-time complexity of a binary-search (setup) may add enough overhead to make O(N) faster for smaller data sets.

And then the benchmarks for updating the Input class:

Update data stays O(1) for arrays, but O(N) for most other containers.

Update data stays O(1) for arrays, but O(N) for most other containers.

Nothing particularly interesting here. We’re just updating all of the key states. In a dynamic container, this requires us to iterate over the entire container and update based on what information is currently available. For any contiguous array of keys it’s constant time, O(k) where k depends on the size of the data. Even if a type of Input Manager has a slightly worse update, it’s not to terrible because update() is only called once per frame. So really update-time information is only provided for completeness.

These are very good results – because they mean that not only is the simplest solution (a vector or a list) just as acceptable as constant-time access, but they’re also the best solutions in terms of memory overhead. KeyConverted* classes keep an array of size 2N, where N is the number of keys the user can press, where a dynamic solution will only store N where N is the number of active keys.

To some this information may seem uninteresting, but honestly I’ve always wondered the best way to go about managing user input. It just never makes enough of a dent in performance to warrant investigation.

Review

So what did we learn from this? Well the surprising thing is that std::vector and std::list are better for InputManagers than constant-time array access for small amounts of active keys (For MSVC12, actual times are compiler, machine, and environment dependent). I’d say in general, try for KeyLimitedRange implementation because it’s growth in complexity is O(1), whereas std::vector unsorted grows at a rate of O(N). However, if the framework doesn’t make this easy by giving us key enumerations which are wildly varying, invest in a more dynamic approach using vectors or lists.

Summary

Since I didn’t discuss everything that I learned in this section, I will mention what I learned trying to tackle this problem:

  • I learned it was possible to setup the QMake.pro project file so that it can have build dependencies.
  • The dependent projects can have a Post-Link Command that allows them to perform some logic after building.
  • The output of a Post-Link Command can be SOURCE/HEADER input of another QMake project.
  • For small data sets, the simpler approach was the more efficient approach as well – thankfully!

If you need to solve any of these problems, please take a peek at the sample code provided.

View Code on GitHub

Cheers!


Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

One thought on “Qt5 OpenGL Part 3a : Efficient Input Managers

  • Trent Post author

    In case anyone was wondering, I have run the same tests on *nix, and the results are similar. Notice that running a debug build we will obviously favor the switch-case. There is usually a lot of debug code in the STL that slows us down a bit.