Designing an Infrared Forwarder/Sniffer/Injector Microcontroller Firmware

In the previous post, I explained the broad setup and gave an early out for those only interested in remote controlling a Denon AVR (or similar device) from a Raspberry Pi (or similar device).

In this post, on the other hand, I'll go all in.

As mentioned at the end of the previous post, it is possible to do all of the things I did here with a Raspberry Pi, without more dedicated hardware, with various drawbacks. It is also much less fun. Here is the fun way.

I decided to make a device which can forward infrared signals in real-time, while at the same time providing a "transcript" of those signals via a serial interface. During times where no signal is being forwarded, it can also take such a transcript via a serial interface and turn it into a real IR signal. I call it a "Infrared Forwarder/Sniffer/Injector Device", or IRFSID.

Because I like a challenge, I picked the smallest microcontroller I had available to me, the Atmel ATtiny2313a 8-bit MCU. It has 2048 bytes of flash program memory, 128 bytes of RAM and 128 bytes of EEPROM. It can safely be driven at 8 MHz of clock speed, with most of the instructions completing in a single clock cycle (with the notable exception of addressed loads/stores, which we'll get back to later). Most instructions take 2 bytes (some take 4 bytes), leaving us with about 1000 instructions.

It comes with a USART peripherial, two timers (one 8 bit, one 16 bit, both PWM-capable), the ability to trigger interrupts on pin changes and some irrelevant stuff we're not going to use.

Bird's Eye Perspective

         │RX pin                         │IR in pin
         │                               │
         │               ┌───────────────┤
         │               │               │
         ▼               │               ▼
┌──────────────────┐     │        ┌────────────┐
│                  │     │        │            │
│  UART Receiver   │     │        │  Analyzer  │
│                  │     │        │            │
└────────┬─────────┘     │        └──────┬─────┘
         │               │               │
         ▼               │               ▼
┌────────────────────┐   │         ┌───────────┐
│                    │   │         │           │
│  Signal generator  │   │         │  UART TX  │
│                    │   │         │           │
└────────┬───────────┘   │         └─────┬─────┘
         │               │               │
         └──────────┐    │               │
                    ▼    ▼               │
              ┌────────────────┐         │
              │                │         │
              │  Multiplexer   │         │
              │                │         │
              └───────┬────────┘         │
                      │                  │
                      ▼                  │
              ┌────────────────┐         │
              │                │         │
              │    Modulator   │         │
              │                │         │
              └───────┬────────┘         │
                      │                  │
                      │                  │
                      ▼IR out pin        ▼ TX pin

The above is a simple block diagram of the architecture we're looking at. The IR in pin is connected to an external IR demodulator IC (such as the TSOP1738) and from there, receives the demodulated signal. The RX pin is connected to the UART TX of the Pi and receives instructions for generating an IR signal.

The multiplexer is responsible for selecting which of the two signals is fed to the modulator for forwarding. The modulator generates the 38 kHz carrier according to its input.

The signal from the IR in pin is also fed to an analyzer component which emits the timing values of the signal (duration of spaces and pulses) on the UART TX pin, which is connected to the UART RX of the Pi.

First implementation: Forwarding unit without multiplexing

It is clear that the modulation should be done in hardware. If we were to bit-bang (eww) the output, we would have to adhere to a strict timing where every ~211 instructions (or less, if 4 byte instructions are involved), we get around to toggle a pin. With that many moving parts in the design, such a strict schedule would be challenging to achieve. Also, there is no need.

The 8-bit timer can easily be used to generate the signal, even when running on an 8 MHz clock. On the ATtiny2313a, when operating a timer in phase-correct PWM mode, the input clock effectively gets divided by four. If we then let the timer count to 52, we get a 38.46 kHz carrier. As the bandpass of one of the IR chips inside the Denon is from 37 kHz to 40 kHz, it seemed safer to err on the high side (otherwise, 53 would've been the value of choice).

We can toggle the PWM output using a single bit in the timer register (COM0A0), which means that we only need three clock cycles to set, clear or toggle it.

Using a pin-change interrupt on the IR input pin, we can now easily implement a forwarder: Whenever the pin input changes, we set the COM0A0 bit to the value at the input pin.

The code for this simple thing, plus some scaffolding for the multiplexing we'll need in the future, can be viewed online.

In addition to the things described here, it has two more additional features:

It already uses the 16 bit timer to keep track of how long the individual pulses/gaps are.
It prolongs the signal by a few 38 kHz cycles; the idea here is that we know that the TSOP1738 is bound to introduce a "delay" in recognizing a pulse, while it will turn off its output almost immediately once a pulse is over. By extending each pulse by a few cycles, we recover this lost part of the signal which may help a downstream decoder properly recognizing the command.

It compiles to 378 bytes of flash and needs one byte of RAM to hold the multiplexer state (which is mostly unused). It also has a hold timer to keep the "lock" on the multiplexer for some time after the last pulse. We'll get to that in more detail later.

Global Variables Are Expensive

As hinted at earlier, arbitrary memory access is slow on the ATtiny2313a; it takes twice as long as most other instructions and it also takes twice as many bytes. The previous version of the code uses a global variable in order to multiplex the access to the modulator.

However, accessing global registers is fast: just a single-cycle instruction. If we look carefully at the program, we'll find that the global variable is really just mirroring information we already hold implicitly in the global registers. In particular:

In the initial state, timer 1 is disabled and the PWM output is also disabled.
If we are forwarding a pulse, the PWM output is enabled (though timer 1 is disabled)
After forwarding a pulse, we use timer 1 to prolong it, so timer 1 is enabled and the PWM output is also enabled.
After the prolonged output, we are forwarding a space for up to a certain time (after which we release the lock on the non-existent multiplexer), so we still need timer 1, but the PWM output is disabled.

Concisely:

Idle: Timer 1 disabled, PWM disabled.
Forwarding pulse: Timer 1 disabled, PWM enabled.
Prolonged pulse: Timer 1 enabled, PWM enabled.
Space: Timer 1 enabled, PWM disabled.

So we can just use those register values as indication in which state we are! This allows us to get rid of the slow global variable access and it reduces the code size, too.

The result can be seen in commit 38c2ae1. 296 bytes of flash, that's a 21% reduction of space. And of course it reduces the latency in the forwarding chain, because we do not need the global loads/stores.

Sending a transcript via UART

To send the transcript of the signal (duration of pauses/spaces including information about whether it is a pause or space), we need to keep track of how long they are. That means we cannot use the enablement of timer 1 anymore. However, we can use another bit: In the case of forwarding a pulse, we do not care about setting an upper limit; hence, we can use the information about whether a comparision with the compare register is enabled or not to know in which of the four states we are.

With that sorted out, forwarding the signal to the UART becomes only a matter of writing the durations into the UART data register. For this, we need to talk about baud rates.

Turns out, 8 MHz is a terrible clock if you want to achieve one of the standard baud rates. The fastest baud rate which can be hit with reasonable accuracy is 38400 (see how similar it is to the 38 kHz PWM we do?). Anything above that will have way too high of a difference.

Luckily, many UART things for Linux nowadays support "arbitrary" non-standard baud rates. In normal mode, the UART of the ATtiny2313a uses a clock division by sixteen for the baud rate generator (mostly because of how the receiver works, oversampling the signal significantly). That leaves us with 500 kHz. We divide that further by 5 to achieve the non-standard baudrate of 100000 Baud.

Transmitting an 8 bit frame using 100000 Baud takes 100us. That is about four strobes of the 38 kHz carrier and just half of the assumed lower bound of pulse durations of valid IR signals. That means we can not assume that we'll be able to transmit two 8 bit frames to relay a duration; that would risk that we overrun some buffer.

(Of course, we could put a buffer into the 128 bytes of RAM, however, given the slowness of loads and stores, I'd like to avoid accessing global memory in the fast path of the IR forwarding.)

Instead, we do the following:

Transmit durations in units of 16us
Use a single bit to indicate pauses vs. pulses
When the duration counter overruns (at 2032us), we emit a frame showing just that duration.

As we explicitly encode a pause vs. pulse, the receiver on the other end can accumulate multiple such overflow frames to reconstruct the actual duration of the pulse/pause. This makes it our encoding O(t), but this is okay: The longest pulse/pause we expect to transmit is just a few dozen milliseconds.

To detect overrun of the counter, we use the compare/match register A of the timer 1. The compare/match register B is used for timing purposes. The reason for doing it this way (even though we previously used A for timing purposes) is that the interrupt priority of the compare/match with register A is higher than the priority for the pin change, meaning that we're less likely to lose overrun information here.

In transmitting, we use the least significant bit to indicate a pause. By running timer 1 on units of 8 us, this allows us to just mask out the lowest bit of the timer value and copy it over into the uart register otherwise.

Example

Assuming an IR signal which consists of a 9000us pulse, 4500us pause, 310us pulse, 740us pause, 310us pulse (making something up here), the sniffer will emit the following bytes (in hex, linebreaks on each level change for readability):

fe fe fe fe 6c
ff ff 37
26
5d
26

Receiving instructions via UART

Now for the real multiplexing!

Receiving data from the UART is easy. After enabling the receiver and the interrupt, the interrupt handler is called whenever a byte has been received from the UART line. Before we get into the processing, we'll have to talk about the data format.

Data format

There are two options:

Use the UART as command interface ("enable output", "disable output") and let the host device handle the timing.
Use the same format as used the transcript, i.e. "enable/disable output for X microseconds".

The advantage of the former format is that it is dead-simple and could be directly mapped to the COM0A0 output. A key downside is that this relies on strict timing of UART frames; that is generally not possible with Linux host devices, as there could be arbitrary buffers involved. In addition, it would be impossible to emit signals shorter than 100us (the frame duration on the 8-bit UART) and timing frames which are just barely above 100us would be extra challenging.

The second option has the advantage that we can script the ATtiny2313a to emit a signal by pushing the description of the signal to it via the UART and have it then asynchronously generate the described signal. The downside here is that we'll need a buffer to keep the parts of the script which have not been executed yet in memory. In addition, we'll have to make sure that we do not overrun this buffer while receiving from the UART. That would, in theory, again require intricate timing on the host device, but there's a neat way around that.

For the multiplexing use case, we'll have to "lock" the modulator at any time to either the UART source or the demodulated source; it cannot be driven by both. In case of the first mode, we'd have to introduce further commands in order to manage that locking. If the modulator is currently locked, this would have to be signalled to the sender or we'd have to steal the output. We cannot signal it to the host device, because the code space for UART frames on the transmitter lane is already completely full. We could use some out-of-band signalling, but that would likely break the timing. Stealing the output would obviously cause corruption in whatever was being forwarded there and potentially also corruption of the signal which was about to be generated from the UART input, unless that defensively includes a long pause at the beginning.

The second option seems more compatible with multiplexing: We can simply buffer the script until the modulator is available, and we can lock the transmitter for exactly as long as the described signal is going.

That said, we can just reuse the same format we use for the transcript on the UART input.

Buffering and Flow Control

As the UART will generally receive faster than we execute the script, we'll have a small buffer in RAM which buffers the incoming script until it is processed. If the host system was transmitting too fast, however, we'd overrun the buffer.

To avoid this, we'll employ a thing commonly supported by UARTs, which is called Hardware Flow Control. Hardware Flow Control allows the embedded device to signal to the host whether it is currently able to receive data. This is done using an active-low Clear-To-Send (CTS) signal in parallel to the RX line. That means that whenever the buffer is nearing its high water mark or whenever we are currently forwarding an IR signal from the demodulated pin, we can, using the CTS signal, tell the host device that it needs to wait for a moment.

Thanks to the data format which is effectively a transcript of the to-be-sent signal, this does not cause timing issues: The signal will start later, but it will still be fully intact.

In practice, it turns out that we need to have at least four bytes of space before de-asserting the CTS signal; the host UARTs I was testing with continued sending a few frames before they reacted to the CTS signal.

Signal generation

Whenever the buffer becomes non-empty (and we can safely forward), we enable signal generation mode. In that mode, timer 1 is programmed to the duration as indicated in the next byte in the buffer. The output is set to correspond to the pause bit.

Once the timer elapses, it checks whether there are more bytes in the buffer. If there are not, signal generation mode is exited. If there are, the timer is reprogrammed and the output is configured according to the bytes value.

It's that simple.

During signal generation mode, the pin-change interrupt for forwarding mode is disabled to avoid it from interfering with the signal generation. Vice versa, the receive interrupt is disabled while forwarding is going on (though this may actually cause loss of data due to the fact that the host may react to the de-assertion of CTS only slowly; that's an edge case which could see improvement).

Wrapping things up

The final code can be observed in the current main.c. There are more detailed explanations about the state machine in there and helper functions which hopefully make the code more comprehensible.

We now have a piece of firmware, but no hardware to go with it. The hardware around it is pretty simple, just level shifters and the capacitors you typically need to provide a stable power supply for a microcontroller. The schematics and a layout suitable for soldering in DHT on a prototyping board are available in the GitHub repository (as Kicad files).