[完整的VHDL代码]使用VHDL的矩阵乘法设计

甚高密度脂蛋白 的代码 矩阵乘法 被表达。该VHDL项目旨在开发和实现可综合的矩阵乘法器内核,该内核能够对32x32大小的矩阵执行矩阵计算。

矩阵的每个组成部分 是16位无符号整数。核心是在Xilinx上实现的 现场可编程门阵列 Spartan-6 XC6SLX45-CSG324-3。行为和路由后验证均已完成。仿真结果与Matlab实施结果进行了精确比较。

   
矩阵乘法的VHDL代码

设计核心框图
设计核心基于矩阵加法的参考设计,该矩阵的输入和输出缓冲区由Xilinx核心生成器生成,以保存输入和输出数据。主要工作是要计算的区块 矩阵乘法。根据矩阵乘法的理论,矩阵乘法由以下方程式完成:

矩阵乘法的VHDL代码
为了计算cij,在“stReadBufferAB”,我们必须读出第i行的所有32列和第j行的32行,并像方程式那样进行乘法和累加。此外,我们再添加一个状态“stSaveData”到参考内核的FSM,目的是在进入下一个状态之前保存累积的数据“stWriteBufferC”.
矩阵乘法的VHDL代码
设计核心的块接口

V 高密度脂蛋白 矩阵乘法器的顶层代码:

 -- fpga4student.com  现场可编程门阵列  projects, Verilog projects,  VHDL项目  
 -- VHDL project: VHDL code for matrix multiplcation
 library ieee;   
 use ieee.std_logic_1164.all;   
 use ieee.numeric_std.all;  
 use IEEE.STD_LOGIC_UNSIGNED.ALL;  
 -- Required entity declaration  
 entity IntMatMulCore is  
      port(  
           Reset, Clock,      WriteEnable, BufferSel:      in std_logic;  
           WriteAddress: in std_logic_vector (9 downto 0);  
           WriteData:           in std_logic_vector (15 downto 0);  
           ReadAddress:      in std_logic_vector (9 downto 0);  
           ReadEnable:      in std_logic;  
           ReadData:           out std_logic_vector (63 downto 0);  
           DataReady:           out std_logic  
      );  
 end IntMatMulCore;  
 architecture IntMatMulCore_arch of IntMatMulCore is  
 -- fpga4student.com FPGA projects, Verilog projects, VHDL projects 
 COMPONENT dpram1024x16  
  PORT (  
   clka : IN STD_LOGIC;  
   wea : IN STD_LOGIC_VECTOR(0 DOWNTO 0);  
   addra : IN STD_LOGIC_VECTOR(9 DOWNTO 0);  
   dina : IN STD_LOGIC_VECTOR(15 DOWNTO 0);  
   clkb : IN STD_LOGIC;  
       enb : IN STD_LOGIC;  
   addrb : IN STD_LOGIC_VECTOR(9 DOWNTO 0);  
   doutb : OUT STD_LOGIC_VECTOR(15 DOWNTO 0)  
  );  
 END COMPONENT;  
 -- fpga4student.com FPGA projects, Verilog projects, VHDL projects 
 COMPONENT dpram1024x64  
  PORT (  
   clka : IN STD_LOGIC;  
   wea : IN STD_LOGIC_VECTOR(0 DOWNTO 0);  
   addra : IN STD_LOGIC_VECTOR(9 DOWNTO 0);  
   dina : IN STD_LOGIC_VECTOR(63 DOWNTO 0);  
   clkb : IN STD_LOGIC;  
   enb : IN STD_LOGIC;  
   addrb : IN STD_LOGIC_VECTOR(9 DOWNTO 0);  
   doutb : OUT STD_LOGIC_VECTOR(63 DOWNTO 0)  
  );  
 END COMPONENT;  
 -- fpga4student.com FPGA projects, Verilog projects, VHDL projects
 type     stateType is (stIdle, stWriteBufferA, stWriteBufferB, stReadBufferAB, stSaveData, stWriteBufferC, stComplete);  
 signal presState: stateType;  
 signal nextState: stateType;  
 signal iReadEnableAB, iCountReset,iCountEnable, iCountEnableAB,iCountResetAB: std_logic;  
 signal iWriteEnableA, iWriteEnableB, iWriteEnableC: std_logic_vector(0 downto 0);  
 signal iReadDataA, iReadDataB: std_logic_vector (15 downto 0);  
 signal iWriteDataC: std_logic_vector (63 downto 0);  
 signal iCount, iReadAddrA, iReadAddrB,iRowA : unsigned(9 downto 0);  
 signal CountAT,CountBT:unsigned(9 downto 0);  
 signal iColB:unsigned(19 downto 0);  
 signal irow,icol,iCountA,iCountB: unsigned(4 downto 0);  
 signal iCountEnableAB_d1,iCountEnableAB_d2,iCountEnableAB_d3: std_logic;  
 begin  
      --Write Enable for RAM A   
      iWriteEnableA(0) <= WriteEnable and BufferSel;  
      --Write Enable for RAM B  
      iWriteEnableB(0) <= WriteEnable and (not BufferSel); 
 -- fpga4student.com FPGA projects, Verilog projects, VHDL projects 
      --Input Buffer A Instance  
           InputBufferA : dpram1024x16  
           PORT MAP (  
                clka => Clock,  
                wea  => iWriteEnableA,  
                addra => WriteAddress,  
                dina => WriteData,  
                clkb      => Clock,  
                enb     => iReadEnableAB,  
                addrb => std_logic_vector(iReadAddrA),  
                doutb => iReadDataA  
           );  
 -- fpga4student.com FPGA projects, Verilog projects, VHDL projects
      InputBufferB : dpram1024x16  
           PORT MAP (  
                clka => Clock,  
                wea  => iWriteEnableB,  
                addra => WriteAddress,  
                dina => WriteData,  
                clkb      => Clock,  
                enb     => iReadEnableAB,  
                addrb => std_logic_vector(iReadAddrB),  
                doutb => iReadDataB  
           );  
 -- fpga4student.com FPGA projects, Verilog projects, VHDL projects
      OutputBufferC : dpram1024x64  
           PORT MAP (  
                clka      => Clock,  
                wea      => iWriteEnableC,  
                addra => std_logic_vector(iCount),  
                dina      => iWriteDataC,  
                clkb      => Clock,  
                enb      => ReadEnable,  
                addrb => ReadAddress,  
                doutb => ReadData  
           );  
      process(Clock,Reset)  
      begin  
      if(rising_edge(Clock)) then  
      if(Reset='1') then  
      iCountEnableAB_d1 <= '0';  
      iCountEnableAB_d2 <= '0';  
      else  
      iCountEnableAB_d1 <= iCountEnable;  
      iCountEnableAB_d2 <= iCountEnableAB_d1;  
      end if;  
      end if;  
      end process;  
      iCountEnableAB_d3 <= (not iCountEnableAB_d2) AND iCountEnableAB_d1 ;  
 -- fpga4student.com FPGA projects, Verilog projects, VHDL projects
      process (Clock)  
      begin  
           if rising_edge (Clock) then  
                if(Reset='1') then  
                     iWriteDataC <= (others => '0');  
                elsif(iWriteEnableC(0)='1') then  
                     iWriteDataC <= (others => '0');  
                elsif(iCountEnableAB_d3='1') then  
                     iWriteDataC <= (others => '0');  
                elsif(iReadEnableAB='1') then   
                     iWriteDataC <= iWriteDataC + std_logic_vector(signed(iReadDataA(15)&iReadDataA)*signed(iReadDataB(15)&iReadDataB));       
                end if;   
           end if;   
      end process;  
 -- fpga4student.com FPGA projects, Verilog projects, VHDL projects
       process (Clock)  
      begin  
           if rising_edge (Clock) then  
                if Reset = '1' then  
                     presState <= stIdle;  
                     iCountA <= (others=>'0');  
                     iCountB <= (others=>'0');  
                else  
                     presState <= nextState;  
                     if iCountResetAB = '1' then  
                          iCountA <= (others=>'0');  
                          iCountB <= (others=>'0');  
                     elsif iCountEnableAB = '1' then  
                          iCountA <= iCountA + 1;  
                          iCountB <= iCountB + 1;  
                     end if;  
                end if;  
                if iCountReset = '1' then  
                     iCount <= (others=>'0');  
                elsif iCountEnable = '1' then  
                     iCount <= iCount + 1;  
                end if;  
           end if;  
      end process;  
      iRowA <= iCount srl 5;  
      iColB <= ("0000000000"&iCount) - iRowA*32;  
      irow <= iRowA(4 downto 0);  
      icol <= iColB(4 downto 0);  
      CountAT <= "00000"&iCountA;  
      CountBT <= "00000"&iCountB;  
      iReadAddrA <= (iRowA sll 5)+CountAT;  
      iReadAddrB <= (CountBT sll 5)+ iColB(9 downto 0);  
 -- fpga4student.com FPGA projects, Verilog projects, VHDL projects
      process (presState, WriteEnable, BufferSel, iCount, iCountA, iCountB)  
      begin  
           -- signal defaults  
           iCountResetAB <= '0';  
           iCountReset <= '0';  
           iCountEnable <= '1';  
           iReadEnableAB <= '0';   
           iWriteEnableC(0) <= '0';  
           Dataready <= '0';  
           iCountEnableAB <= '0';  
           case presState is  
                when stIdle =>       
                     if (WriteEnable = '1' and BufferSel = '1') then  
                          nextState <= stWriteBufferA;  
                     else  
                          iCountReset <= '1';  
                          nextState <= stIdle;  
                     end if;  
                when stWriteBufferA =>  
                     if iCount = x"3FF" then  
                          report "Writing A";  
                          iCountReset <= '1';                      
                          nextState <= stWriteBufferB;  
                      else  
                          nextState <= stWriteBufferA;  
                     end if;  
                when stWriteBufferB =>  
                     report "Writing B";  
                     if iCount = x"3FF" then  
                          iCountReset <= '1';                      
                          nextState <= stReadBufferAB;  
                      else  
                          nextState <= stWriteBufferB;  
                     end if;  
                when stReadBufferAB =>  
                     iReadEnableAB <= '1';  
                     iCountEnable <= '0';  
                     report "CalculatingAB";  
                     if iCountA = x"1F" and iCountB = x"1F" then  
                          nextState <= stSaveData;  
                          report "Calculating";  
                          iCountEnableAB <= '0';  
                          iCountResetAB <= '1';  
                     else  
                          nextState <= stReadBufferAB;  
                          iCountEnableAB <= '1';  
                          iCountResetAB <= '0';  
                     end if;  
                when stSaveData =>   
                     iReadEnableAB <= '1';  
                     iCountEnable <= '0';  
                     nextState <= stWriteBufferC;  
                when stWriteBufferC =>  
                     iWriteEnableC(0) <= '1';  
                     report "finish 1 component";  
                     if iCount = x"3FF" then  
                          iCountReset <= '1';                           
                          nextState <= stComplete;  
                      else  
                          nextState <= stReadBufferAB;  
                     end if;                 
                when stComplete =>  
                     DataReady <= '1';  
                     nextState <= stIdle;                      
           end case;  
      end process;  
 end IntMatMulCore_arch;

下面是测试平台代码,用于从文本文件读取矩阵输入并将结果写入输出文本文件:

-- VHDL project: 矩阵乘法的VHDL代码
-- fpga4student.com  现场可编程门阵列  projects, Verilog projects, VHDL projects   
-- Testbench 矩阵乘法的VHDL代码
 library ieee;  
 use ieee.std_logic_1164.all;  
 use ieee.std_logic_textio.all;  
 use ieee.numeric_std.all;  
 use std.textio.all;  
 entity tb_IntMatMultCore is  
 end tb_IntMatMultCore;  
 architecture behavior of tb_IntMatMultCore is   
 component IntMatMulCore  
      port(  
           Reset, Clock,      WriteEnable, BufferSel:      in std_logic;  
           WriteAddress: in std_logic_vector (9 downto 0);  
           WriteData:           in std_logic_vector (15 downto 0);  
           ReadAddress:      in std_logic_vector (9 downto 0);  
           ReadEnable:      in std_logic;  
           ReadData:           out std_logic_vector (63 downto 0);  
           DataReady:           out std_logic  
      );  
 end component;        
 signal tb_Reset : std_logic := '0';  
 signal tb_Clock : std_logic := '0';  
 signal tb_BufferSel : std_logic := '0';  
 signal tb_WriteEnable : std_logic := '0';  
 signal tb_WriteAddress : std_logic_vector(9 downto 0) := (others => '0');  
 signal tb_WriteData : std_logic_vector(15 downto 0) := (others => '0');  
 signal tb_ReadEnable : std_logic := '0';  
 signal tb_ReadAddress : std_logic_vector(9 downto 0) := (others => '0');  
 signal tb_DataReady : std_logic;  
 signal tb_ReadData : std_logic_vector(63 downto 0);  
 -- Clock period definitions  
 constant period : time := 100 ns;    
 begin  
 -- fpga4student.com FPGA projects, Verilog projects, VHDL projects
      -- Instantiate the Unit Under Test (UUT)  
      uut: IntMatMulCore   
           PORT MAP (  
                Reset                    => tb_Reset,  
                Clock                    => tb_Clock,  
                WriteEnable          => tb_WriteEnable,  
                BufferSel          => tb_BufferSel,  
                WriteAddress     => tb_WriteAddress,  
                WriteData          => tb_WriteData,            
                ReadEnable          => tb_ReadEnable,  
                ReadAddress          => tb_ReadAddress,  
                ReadData               => tb_ReadData,  
                DataReady          => tb_DataReady  
    );  
  -- Test Bench Statements  
      process is       
      begin  
           --while now <= 10000000000000 * period loop  
                tb_Clock <= '0';  
                wait for period/2;  
                tb_Clock <= '1';  
                wait for period/2;  
           --end loop;  
           --wait;  
      end process;  
      process is       
      begin  
           tb_Reset <= '1';  
           wait for 10*period;  
           tb_Reset <= '0';  
           wait;    
      end process;  
      writingA: process is                                
           file FIA: TEXT open READ_MODE is "InputA.txt";  -- the input file must have 17 rows  
           file FIB: TEXT open READ_MODE is "InputB.txt";  -- the input file must have 17 rows  
           variable L: LINE;  
           variable tb_MatrixData: std_logic_vector(15 downto 0);  
      begin  
           tb_WriteEnable <= '0';  
           tb_BufferSel <= '1';  
           tb_WriteAddress <= "11"&x"FF";  
           wait for 20*period;  
           READLINE(FIA, L);  
           while not ENDFILE(FIA) loop  
                READLINE(FIA, L);            
                HREAD(L, tb_MatrixData);       
                wait until falling_edge(tb_Clock);  
                tb_WriteAddress <= std_logic_vector(unsigned(tb_WriteAddress)+1);  
                tb_BufferSel <= '1';  
                tb_WriteEnable <= '1';  
                tb_WriteData <=tb_MatrixData;  
           end loop;  
           READLINE(FIB, L);  
           while not ENDFILE(FIB) loop  
                READLINE(FIB, L);            
                HREAD(L, tb_MatrixData);       
                wait until falling_edge(tb_Clock);  
                tb_WriteAddress <= std_logic_vector(unsigned(tb_WriteAddress)+1);  
                tb_BufferSel <= '0';  
                tb_WriteEnable <= '1';  
                tb_WriteData <=tb_MatrixData;  
           end loop;  
           wait for period;  
           tb_WriteEnable <= '0';            
           wait;   
      end process;       
 -- fpga4student.com FPGA projects, Verilog projects, VHDL projects
      reading: process is                                
           file FO: TEXT open WRITE_MODE is "OutputMultC.txt";  
           file FI: TEXT open READ_MODE is "OutputMultC_matlab.txt";  
           variable L, Lm: LINE;  
           variable v_ReadDatam: std_logic_vector(63 downto 0);  
           variable v_OK: boolean;  
      begin  
           tb_ReadEnable <= '0';  
           tb_ReadAddress <=(others =>'0');  
           ---wait for Mul done       
           wait until rising_edge(tb_DataReady);   
           wait until falling_edge(tb_DataReady);   
           READLINE(FI, Lm);  
           Write(L, STRING'("OutputMultC"));  
           WRITELINE(FO, L);  
           tb_ReadEnable<= '1' ;  
           while not ENDFILE(FI) loop  
                wait until rising_edge(tb_Clock);  
                wait for 20 ns;  
                READLINE(FI, Lm);  
                HREAD(Lm, v_ReadDatam);            
                if v_ReadDatam = tb_ReadData then  
                     v_OK :=True;  
                     report "Matched";  
                else  
                     v_OK :=False;  
                end if;  
                HWRITE(L, '0'& tb_ReadData, Left, 10);  
                WRITE(L, v_OK, Right, 10);                 
                WRITELINE(FO, L);  
                tb_ReadAddress <= std_logic_vector(unsigned(tb_ReadAddress)+1);  
           end loop;  
           tb_ReadEnable <= '0';  
           assert false report "Simulation Finished" severity failure; -- to stop simulation  
           wait;   
      end process;  
 end;  

矩阵乘法设计的行为仿真

完成乘法器内核的设计后,我们对内核进行行为仿真。这 测试平台正在读取输入A和B,然后生成输出C,然后与Matlab结果进行比较。如果结果与Matlab相比为100%,则输出文件中的数据“OutputMultC.txt” will be “true”。否则,它将是错误的。
矩阵乘法的VHDL代码
行为模拟波形
在34816个时钟周期后,对尺寸为32x32的矩阵的矩阵乘法完成,并且信号“dataready”被断言为高。这是合理的,因为要花费32个周期来计算输出矩阵C的每个矩阵分量。还有2个周期分别为每个矩阵分量保存数据并将数据写入缓冲区C。因此,有34个时钟周期用于计算矩阵C的一个分量。矩阵C的大小为32x32,则矩阵乘法时间为32x32x34 = 34816个周期。
行为仿真结果与Matlab进行了正确比较。其实输出文件“OutputMultC.txt” is all “true”。我们还会检查内存 缓冲区输出的内容和结果当然类似于Matlab计算。
路由后仿真
为了执行路由后仿真,我们进行合成并获得要翻译的后翻译网表。综合结果表明,用于该设计的时钟的最大频率为120.757MHz。这意味着时钟的最小周期为8.281ns。因此,32x32矩阵的最小乘法时间为34816x8.281ns = 288.311 us。
布线后仿真结果与Matlab结果和行为仿真精确相似。确实,输出文件“OutputMultC.txt” is all “true” as our expectation.

在该项目中,具有32x32 16位无符号整数的矩阵的矩阵乘法在Xilinx的FPGA Spartan6上实现。 32x32矩阵的最小乘法时间为288.311 us。



推荐的 VHDL项目 :
1. 什么是FPGA? VHDL如何在FPGA上工作
2. FIFO存储器的VHDL代码
3. FIR滤波器的VHDL代码
4. 8位微控制器的VHDL代码
5. 矩阵乘法的VHDL代码
6. 开关尾环计数器的VHDL代码
7.  现场可编程门阵列 上数字闹钟的VHDL代码
8. 8位比较器的VHDL代码
9. 如何使用VHDL将文本文件加载到FPGA中
10。  D触发器的VHDL代码
11。  完整加法器的VHDL代码
12  具有可变占空比的VHDL中的PWM发生器
13  ALU的VHDL代码
14。  带测试平台的计数器的VHDL代码
15  16位ALU的VHDL代码
16。   甚高密度脂蛋白 中的移位器设计
17。   甚高密度脂蛋白 中的非线性查找表实现
18岁   甚高密度脂蛋白 中的密码协处理器设计

 现场可编程门阵列  Verilog VHDL课程

7条评论:

  1. 你好!
    我可以在Basys 3上使用Vivado 2015.4实施该项目吗?我曾尝试这样做,但Vivado 2015.4却没有't读取/获取这些文件(InputBufferA-dpram1024x16,InputBufferB-dpram1024x16和OutputBufferC-dpram1024x64)
    还是我'必须创建它们并放入文件夹中"sources".
    你能帮助我吗?我很乐意得到答案。

    回复 删除
  2. 您必须使用Xilinx Generator来生成该内存。然后将其包含到项目中。

    回复 删除
  3. 回覆
    1. 有可能,但是您需要用Quartus Altera替换Xilinx的Block RAM IP。

      删除
  4. 嗨,我收到读取命令错误。从输入A文件。我是否需要导入任何I / O软件包?

    回复 删除
  5. 有人可以提供对InputA,inputB,outputC和OutputC Matlab文件的包含的信息吗

    回复 删除

热门FPGA项目