Bullet Time

The audio appears to be playing back in slow motion, but why?

Take another look at the reader process which is pulling data off of the FIFO stack using the system clock with one l/r pair coming off followed by a wait in the forth state until the framework signals it has used the two samples via i_audio_taken.

This seems correct at first, however, the i_audio_taken is coming from the Replay Audio framework which is running off of a different clock, the aud_clk.

and just when you thought we’d put all that clock nonesense behind us…

The audio clock we set to be based on the aux clock at 49.152MHz, which is the frequency the core sees for clk_aud. The clk is enabled for only 1 in every 4 ticks so effectively when i_audio_taken is driven high, it remains high for a single 12.288MHz clock cycle.

This is important because the reader process is running off of the system clock which is derived from Y0 (clk_gen_a0) at 114.5MHz, comes in on U10 (i_clk_a) and is internally divided by 4 to produce o_clk_sys running at 28.652MHz this is further divided by 4 using a 1 in 4 i_ena_sys. The reader state is thus checking i_audio_taken at a frequency of 7.156MHz, around 58% of the audio frequency. This makes it possible for an audio taken pulse to be missed completely.

What Pulse?

If a taken pulse is missed, let’s see what will happen. The reader will continue to wait in the “11” state rather than preparing a new sample pair (L/R) from the FIFO. The audio framework will eventually output another sample and signal with another taken pulse. At this point the same sample will now have been output twice. Whilst this might not occur consistently to give an exact 50% slow down in playback, it sounds close.

As I have yet to obtain a logic analyzer, chipscope seems to be the next best thing. I won’t pretend to know how to properly set this up, but copying the cs_debug block from the loader and the icon and ila files from the cs/ directory of the amiga core into a cs/ folder is sufficient for the chipscope to run within the fpga and be picked up by the analyzer software.

The final version of the source includes the chipscope changes should you wish to experiment.

Here’s a chipscope output that shows how an audio taken pulse can be missed.

FPGA Replay Framework

FPGA Replay Framework

At the top of the trace you can see the ticking of the ena_aud clock at 12.288MHz. Below that is the audio taken pulse coming from the framework as it signals it has taken the audio sample and the Core should prepare the next before it is needed in roughly 20.8µs (48kHz), plenty of time. Notice however the last trace, the system clock pulse, there’s no system clock pulse occuring whilst the audio taken pulse is high. We’ve missed the pulse. Instead the reader will do nothing for those 20.8µs, maybe we’ll catch the next pulse.

If that doesn’t quite make sense, a perhaps better explanation of what is going on here can be found on nandland.

So how should this be fixed? Those who read the above link can remain silent.

Multiple Drivers

To solve this issue, the reader cannot use i_audio_taken directly. Instead we need a way for the audio clock based process to set a flag when i_audio_taken goes high and for the reader (system clock based) process to clear this once it has seen that a new sample can be prepared.

As a VHDL newbie, that’s just what I tried. You’ll soon find this causes an error as two different processes cannot drive the same signal unless all but one is driving it to Z.

That means one cannot set it to ‘1’ and the other later clear it to ‘0’. A concept that coming from a software background seems strange at first, but makes sense when you consider things on a hardware level i.e what state should the output be if one wire is driving it high and the other low?

Instead the solution is to stretch out the i_audio_taken pulse long enough that it cannot possibly be missed by the slower running reader process.

The loader does this via i_audio_hold and three additional signals m1..m3. Which stretch the pulse out across 4 system clocks.

Before I understood how the loader worked however, I had implemented a variation on this which stretches the pulse out a bit longer, ok a lot longer, keeping it high (or low) until the next audio sample is taken 20.8µs later as opposed to the tiny (yet ample) 140ns the loader stretches.

p_audio_out : process
begin
  wait until rising_edge(i_clk_aud);

  if (i_ena_aud = '1') then
    -- Stretch taken pulse to sync with reader process
    if (i_audio_taken = '1') then
      audio_taken_sync <= not audio_taken_sync;
    end if;
  end if;
end process;

This is only part of the solution, as it only toggles at 48kHz the reader will now reach the “11” state and if it checked audio_taken_sync directly it would, when high, instantly loop around causing way too many samples to be lost.

Two additional signals are added to the reader process to handle this. sys_audio_taken_sync and sys_audio_taken_sync_old.

-- Transfer available sample data from FIFO out to audio subsystem
p_fileio_reader : process(i_clk_sys, i_rst_sys)
begin
    if (i_rst_sys = '1') then
            fileio_sample_cnt <= "00";
            fileio_taken      <= '0';
            sample_audio_l   <= (others => '0');
            sample_audio_r   <= (others => '0');
    elsif rising_edge(i_clk_sys) then
        if (i_ena_sys = '1') then

            sys_audio_taken_sync_old <= sys_audio_taken_sync;
            sys_audio_taken_sync <= audio_taken_sync;

            fileio_taken <= '0';
            case fileio_sample_cnt is
                when "00" =>
                    if (fileio_valid = '1') then
                        fileio_taken <= '1';
                        sample_audio_l   <= fileio_data( 7 downto 0) & fileio_data(15 downto 8) & x"00";
                        fileio_sample_cnt <= "01";
                    end if;
                when "01" => -- wait for taken to update valid
                        fileio_sample_cnt <= "10";

                when "10" =>
                    if (fileio_valid = '1') then
                        fileio_taken <= '1';
                        sample_audio_r   <= fileio_data( 7 downto 0) & fileio_data(15 downto 8) & x"00";
                        fileio_sample_cnt <= "11";
                    end if;
                when "11" => -- ready
                    if (sys_audio_taken_sync /= sys_audio_taken_sync_old) then
                        fileio_sample_cnt <= "00";
                    end if;
                when others => null;
            end case;
        end if;
    end if;

end process;

One final change is o_audio_l and o_audio_r are no longer directly set here. Instead the output is buffered in sample_audio_l and sample_audio_r. Only when the audio clock is enabled will these now be transfered to the actual o_audio_l and o_audio_r.

This is handled by the p_audio_out process which also toggles audio_taken_sync anytime i_audio_taken goes high.

 p_audio_out : process
  begin
    wait until rising_edge(i_clk_aud);

    if (i_ena_aud = '1') then
      o_audio_l <= sample_audio_l;
      o_audio_r <= sample_audio_r;

			-- Stretch taken pulse to sync with reader process
      if (i_audio_taken = '1') then
        audio_taken_sync <= not audio_taken_sync;
      end if;
    end if;

  end process;

With this setup, when in the “11” state, we can detect that audio_taken_sync is high and that it was also low the previous time the reader looked at it (last sys clock) and vice-versa. Thus even though audio_taken_sync will remain high for longer than it takes the reader to go through an entire “00” to “11” loop, we know it has not changed and need to now wait.

Put another way, we ensure each time audio_taken_sync toggles only ONE loop of the reader occurs. Here’s a trace showing the now stretched pulse.

FPGA Replay Framework

FPGA Replay Framework

Note: The solid black lines represent discontinuity in the trace as only a limited number of samples were captured when i_audio_taken goes high. Notice how audio_taken_sync toggles with each occurrence of i_audio_taken. Also how sys_audio_taken_sync follows the toggling but is aligned to the system clock edge. Finally sys_audio_taken_sync_old tracks sys_audio_taken_sync but one system clock later to ensure we don’t trigger twice off of the same pulse.

The loader version on the other hand, only stretches the audio taken pulse out by 4 system clocks and since it will take more than 4 system clocks after leaving state “11” before the loop back to state “11” is complete, it knows the pulse will not still be high from the previous loop, allowing it to tigger off a high rather than a toggle.

The loader method is likely preferable in the general case although in this instance both methods appear to solve the problem.

NOTE: If you’re looking through the loader code, you will find a few extra processes involved before audio is output than with the current Example_Audio_Top entity. They allow switching between tone generation output or sampled (fileio) output.

All that’s missing now is a bit of PCM data to load via the OSD and optionally monitor the left and right audio output via two aux pins because you can never have enough excuses for using a scope.

Export an mp3 using audacity as “other uncompressed file” with “RAW (header-less)” and “Signed 16bit-PCM”, little-endian. Also be sure to first re-sample to 48kHz if the mp3 is the more usual 44.1kHz.

Build, run, select the file via the OSD and admire your work.

So where next? Maybe video…

I’ve included a few additional notes about the Generic IO protocol for Extra Credit.

Back to index