Verilog中的定点矩阵乘法[完整代码+教程]

 

Verilog项目 是实现合成的固定点 矩阵乘法 在Verilog HDL。呈现的完整Verilog代码。

两个固定点矩阵A和B是Xilinx Core Generator创建的框。在将这两个矩阵乘以之后,将结果写入另一个是框的矩阵。 TestBench码读取输出矩阵的内容并写入“结果.DAT”文件以检查结果。


Verilog中的定点

首先,您需要知道固定点意味着什么以及它在二进制数字中的显示方式。这个主题非常受欢迎,很多人已经发布了它,所以你可以参考 为了熟悉定点号,它如何在二进制数字中提出,以及我们在数字设计中使用固定点数。
固定点计算显然与正常二进制计算不同,因此我们需要一个不同的Verilog库进行用于在FPGA上处理它的定点数学函数。幸运的是,我们可以从中获取Verilog数学库以获取来自的固定点数 Opencores.  或者你直接下载 这里  如果您没有帐户。该库包含基本的数学函数,例如添加,乘法,verilog中的分区,用于定点号。因此,您需要做的是下载库,并花费一些时间了解格式以及如何在Verilog中使用Fixed-point计算的功能。

到目前为止,我们可以使用定点Verilog库对两个数字处理定点乘法。接下来,我们需要创建两个框来存储两个固定点输入矩阵。 Xilinx Core Generator可以帮助我们创建输入存储器以节省两个输入矩阵。我们可以使用核心生成器将2个矩阵的初始内容存储为乘法,或者我们可以将输入数据写入Verilog代码中的存储器。在此项目中,使用第一种方法,我们将保存两个定点矩阵的内容 矩阵_A.coe和矩阵_b.coe,然后在合成或仿真期间,这些内容被加载到两个输入存储器中。 我们只需访问这些存储器并读取数据以获取定点矩阵乘法。以下是Xilinx .coe的示例文件:
 memory_initialization_radix=10;  
 memory_initialization_vector=  
 256 256 256 256  
 256 256 256 256  
 256 256 256 256  
 256 256 256 256  
您可以修改它们以更改矩阵,但有人指出,修改后,重新生成核心 这些核心的发电机。然后复制网表(Matrix_a.ngc和  Matrix_B. .ngc)到ISE项目的文件夹。 以下是我们从Xilinx Core Generator获得的代码:
LIBRARY ieee;  
 USE ieee.std_logic_1164.ALL;  
 -- synthesis translate_off  
 LIBRARY XilinxCoreLib;  
 -- synthesis translate_on  
 -- fpga4student.com FPGA projects,  Verilog项目 s, VHDL projects 
 -- Verilog project: 用于固定点矩阵乘法的Verilog代码 
 -- Matrix memory generated by Xilinx Core Generator
 ENTITY Matrix_A IS  
  PORT (  
   clka : IN STD_LOGIC;  
   addra : IN STD_LOGIC_VECTOR(3 DOWNTO 0);  
   douta : OUT STD_LOGIC_VECTOR(15 DOWNTO 0)  
  );  
 END Matrix_A;  
 ARCHITECTURE Matrix_A_a OF Matrix_A IS  
 -- synthesis translate_off  
 COMPONENT wrapped_Matrix_A  
  PORT (  
   clka : IN STD_LOGIC;  
   addra : IN STD_LOGIC_VECTOR(3 DOWNTO 0);  
   douta : OUT STD_LOGIC_VECTOR(15 DOWNTO 0)  
  );  
 END COMPONENT;  
 -- Configuration specification  
  FOR ALL : wrapped_Matrix_A USE ENTITY XilinxCoreLib.blk_mem_gen_v6_1(behavioral)  
   GENERIC MAP (  
    c_addra_width => 4,  
    c_addrb_width => 4,  
    c_algorithm => 1,  
    c_axi_id_width => 4,  
    c_axi_slave_type => 0,  
    c_axi_type => 1,  
    c_byte_size => 9,  
    c_common_clk => 0,  
    c_default_data => "0",  
    c_disable_warn_bhv_coll => 0,  
    c_disable_warn_bhv_range => 0,  
    c_family => "spartan6",  
    c_has_axi_id => 0,  
    c_has_ena => 0,  
    c_has_enb => 0,  
    c_has_injecterr => 0,  
    c_has_mem_output_regs_a => 0,  
    c_has_mem_output_regs_b => 0,  
    c_has_mux_output_regs_a => 0,  
    c_has_mux_output_regs_b => 0,  
    c_has_regcea => 0,  
    c_has_regceb => 0,  
    c_has_rsta => 0,  
    c_has_rstb => 0,  
    c_has_softecc_input_regs_a => 0,  
    c_has_softecc_output_regs_b => 0,  
    c_init_file_name => "Matrix_A.mif",  
    c_inita_val => "0",  
    c_initb_val => "0",  
    c_interface_type => 0,  
    c_load_init_file => 1,  
    c_mem_type => 3,  
    c_mux_pipeline_stages => 0,  
    c_prim_type => 1,  
    c_read_depth_a => 16,  
    c_read_depth_b => 16,  
    c_read_width_a => 16,  
    c_read_width_b => 16,  
    c_rst_priority_a => "CE",  
    c_rst_priority_b => "CE",  
    c_rst_type => "SYNC",  
    c_rstram_a => 0,  
    c_rstram_b => 0,  
    c_sim_collision_check => "ALL",  
    c_use_byte_wea => 0,  
    c_use_byte_web => 0,  
    c_use_default_data => 0,  
    c_use_ecc => 0,  
    c_use_softecc => 0,  
    c_wea_width => 1,  
    c_web_width => 1,  
    c_write_depth_a => 16,  
    c_write_depth_b => 16,  
    c_write_mode_a => "WRITE_FIRST",  
    c_write_mode_b => "WRITE_FIRST",  
    c_write_width_a => 16,  
    c_write_width_b => 16,  
    c_xdevicefamily => "spartan6"  
   );  
 -- synthesis translate_on  
 BEGIN  
 -- synthesis translate_off 
 -- fpga4student.com FPGA projects, Verilog projects, VHDL projects 
 -- Verilog project: 用于固定点矩阵乘法的Verilog代码 
 -- Matrix memory generated by Xilinx Core Generator
 U0 : wrapped_Matrix_A  
  PORT MAP (  
   clka => clka,  
   addra => addra,  
   douta => douta  
  );  
 -- synthesis translate_on  
 END Matrix_A_a;  

 LIBRARY ieee;  
 USE ieee.std_logic_1164.ALL;  
 -- synthesis translate_off  
 LIBRARY XilinxCoreLib;  
 -- synthesis translate_on  
  -- fpga4student.com FPGA projects, Verilog projects, VHDL projects 
 -- Verilog project: 用于固定点矩阵乘法的Verilog代码 
 -- Matrix memory generated by Xilinx Core Generator
 ENTITY ROM IS  
  PORT ( 
   clka : IN STD_LOGIC;  
   addra : IN STD_LOGIC_VECTOR(3 DOWNTO 0);  
   douta : OUT STD_LOGIC_VECTOR(15 DOWNTO 0)  
  );  
 END ROM;  
 ARCHITECTURE ROM_a OF ROM IS  
 -- synthesis translate_off  
 COMPONENT wrapped_ROM  
  PORT (  
   clka : IN STD_LOGIC;  
   addra : IN STD_LOGIC_VECTOR(3 DOWNTO 0);  
   douta : OUT STD_LOGIC_VECTOR(15 DOWNTO 0)  
  );  
 END COMPONENT;  
 -- Configuration specification 
 -- fpga4student.com FPGA projects, Verilog projects, VHDL projects  
  FOR ALL : wrapped_ROM USE ENTITY XilinxCoreLib.blk_mem_gen_v6_1(behavioral)  
   GENERIC MAP (  
    c_addra_width => 4,  
    c_addrb_width => 4,  
    c_algorithm => 1,  
    c_axi_id_width => 4,  
    c_axi_slave_type => 0,  
    c_axi_type => 1,  
    c_byte_size => 9,  
    c_common_clk => 0,  
    c_default_data => "0",  
    c_disable_warn_bhv_coll => 0,  
    c_disable_warn_bhv_range => 0,  
    c_family => "spartan6",  
    c_has_axi_id => 0,  
    c_has_ena => 0,  
    c_has_enb => 0,  
    c_has_injecterr => 0,  
    c_has_mem_output_regs_a => 0,  
    c_has_mem_output_regs_b => 0,  
    c_has_mux_output_regs_a => 0,  
    c_has_mux_output_regs_b => 0,  
    c_has_regcea => 0,  
    c_has_regceb => 0,  
    c_has_rsta => 0,  
    c_has_rstb => 0,  
    c_has_softecc_input_regs_a => 0,  
    c_has_softecc_output_regs_b => 0,  
    c_init_file_name => "ROM.mif",  
    c_inita_val => "0",  
    c_initb_val => "0",  
    c_interface_type => 0,  
    c_load_init_file => 1,  
    c_mem_type => 3,  
    c_mux_pipeline_stages => 0,  
    c_prim_type => 1,  
    c_read_depth_a => 16,  
    c_read_depth_b => 16,  
    c_read_width_a => 16,  
    c_read_width_b => 16,  
    c_rst_priority_a => "CE",  
    c_rst_priority_b => "CE",  
    c_rst_type => "SYNC",  
    c_rstram_a => 0,  
    c_rstram_b => 0,  
    c_sim_collision_check => "ALL",  
    c_use_byte_wea => 0,  
    c_use_byte_web => 0,  
    c_use_default_data => 0,  
    c_use_ecc => 0,  
    c_use_softecc => 0,  
    c_wea_width => 1,  
    c_web_width => 1,  
    c_write_depth_a => 16,  
    c_write_depth_b => 16,  
    c_write_mode_a => "WRITE_FIRST",  
    c_write_mode_b => "WRITE_FIRST",  
    c_write_width_a => 16,  
    c_write_width_b => 16,  
    c_xdevicefamily => "spartan6"  
   );  
 -- synthesis translate_on  
 BEGIN  
 -- synthesis translate_off  
 U0 : wrapped_ROM  
  PORT MAP (  
   clka => clka,  
   addra => addra,  
   douta => douta  
  );  
 -- synthesis translate_on  
 END ROM_a;  
为了保存定点矩阵乘法的结果,我们需要一个更多的输出存储器,我们可以使用核心生成器来创建它。注意到,此内存与这两个存储器不同,因为它应该具有输入和输出端口,以将数据写入并获取数据。下面是来自Xilinx核心发生器的核心,用于输出存储器:
LIBRARY ieee;  
 USE ieee.std_logic_1164.ALL;  
 -- synthesis translate_off  
 LIBRARY XilinxCoreLib;  
 -- synthesis translate_on  
 -- fpga4student.com FPGA projects, Verilog projects, VHDL projects 
 -- Verilog project: 用于固定点矩阵乘法的Verilog代码 
 -- Matrix memory generated by Xilinx Core Generator for storing 矩阵乘法  results
 ENTITY matrix_out IS  
  PORT (  
   clka : IN STD_LOGIC;  
   wea : IN STD_LOGIC_VECTOR(0 DOWNTO 0);  
   addra : IN STD_LOGIC_VECTOR(3 DOWNTO 0);  
   dina : IN STD_LOGIC_VECTOR(15 DOWNTO 0);  
   douta : OUT STD_LOGIC_VECTOR(15 DOWNTO 0)  
  );  
 END matrix_out;  
 ARCHITECTURE matrix_out_a OF matrix_out IS  
 -- synthesis translate_off  
 COMPONENT wrapped_matrix_out  
  PORT (  
   clka : IN STD_LOGIC;  
   wea : IN STD_LOGIC_VECTOR(0 DOWNTO 0);  
   addra : IN STD_LOGIC_VECTOR(3 DOWNTO 0);  
   dina : IN STD_LOGIC_VECTOR(15 DOWNTO 0);  
   douta : OUT STD_LOGIC_VECTOR(15 DOWNTO 0)  
  );  
 END COMPONENT;  
 -- Configuration specification  
 -- fpga4student.com FPGA projects, Verilog projects, VHDL projects 
 -- Matrix memory generated by Xilinx Core Generator for storing 矩阵乘法  results
  FOR ALL : wrapped_matrix_out USE ENTITY XilinxCoreLib.blk_mem_gen_v6_1(behavioral)  
   GENERIC MAP (  
    c_addra_width => 4,  
    c_addrb_width => 4,  
    c_algorithm => 1,  
    c_axi_id_width => 4,  
    c_axi_slave_type => 0,  
    c_axi_type => 1,  
    c_byte_size => 9,  
    c_common_clk => 0,  
    c_default_data => "0",  
    c_disable_warn_bhv_coll => 0,  
    c_disable_warn_bhv_range => 0,  
    c_family => "spartan6",  
    c_has_axi_id => 0,  
    c_has_ena => 0,  
    c_has_enb => 0,  
    c_has_injecterr => 0,  
    c_has_mem_output_regs_a => 0,  
    c_has_mem_output_regs_b => 0,  
    c_has_mux_output_regs_a => 0,  
    c_has_mux_output_regs_b => 0,  
    c_has_regcea => 0,  
    c_has_regceb => 0,  
    c_has_rsta => 0,  
    c_has_rstb => 0,  
    c_has_softecc_input_regs_a => 0,  
    c_has_softecc_output_regs_b => 0,  
    c_init_file_name => "no_coe_file_loaded",  
    c_inita_val => "0",  
    c_initb_val => "0",  
    c_interface_type => 0,  
    c_load_init_file => 0,  
    c_mem_type => 0,  
    c_mux_pipeline_stages => 0,  
    c_prim_type => 1,  
    c_read_depth_a => 16,  
    c_read_depth_b => 16,  
    c_read_width_a => 16,  
    c_read_width_b => 16,  
    c_rst_priority_a => "CE",  
    c_rst_priority_b => "CE",  
    c_rst_type => "SYNC",  
    c_rstram_a => 0,  
    c_rstram_b => 0,  
    c_sim_collision_check => "ALL",  
    c_use_byte_wea => 0,  
    c_use_byte_web => 0,  
    c_use_default_data => 0,  
    c_use_ecc => 0,  
    c_use_softecc => 0,  
    c_wea_width => 1,  
    c_web_width => 1,  
    c_write_depth_a => 16,  
    c_write_depth_b => 16,  
    c_write_mode_a => "WRITE_FIRST",  
    c_write_mode_b => "WRITE_FIRST",  
    c_write_width_a => 16,  
    c_write_width_b => 16,  
    c_xdevicefamily => "spartan6"  
   );  
 -- synthesis translate_on  
 BEGIN  
 -- synthesis translate_off  
 U0 : wrapped_matrix_out  
  PORT MAP (  
   clka => clka,  
   wea => wea,  
   addra => addra,  
   dina => dina,  
   douta => douta  
  );  
 -- synthesis translate_on 
 END matrix_out_a;  
可以很容易地看到它具有输入端口,以便在存储器中写入存储器并读取数据。该项目是计算4x4矩阵的固定点乘法。以前提到了用于矩阵乘法的技术 previous post: VHDL代码。 如果您正在寻找VHDL版本,您可以参考此选项。

以下是FIRED-POINT的Verilog代码:

`timescale 1ns / 1ps  
 // Fixed point 4x4 Matrix Multiplication  
 // yl315.net FPGA projects,  Verilog项目 s, VHDL projects
 // Verilog project: Verilog code for fixed point Matrix multiplication 
 module matrix_multiplication(  
           input clk,reset,  
      output [15:0] data_out  
   );  // yl315.net FPGA projects, Verilog projects, VHDL projects 
       // Input and output format for fixed point  
      //     |1|<- N-Q-1 bits ->|<--- Q bits -->|  
      // |S|IIIIIIIIIIIIIIII|FFFFFFFFFFFFFFF|  
 wire [15:0] mat_A;  
 wire [15:0] mat_B;  
 wire overflow1,overflow2,overflow3,overflow4;  
 reg wen;  
 reg [15:0]data_in;  
 reg [3:0] addr;  
 reg [4:0] address;  
 reg [15:0] matrixA[3:0][3:0],matrixB[3:0][3:0];  
 //wire [15:0] matrix_output[3:0][3:0];  
 wire [15:0] tmp1[3:0][3:0],tmp2[3:0][3:0],tmp3[3:0][3:0],tmp4[3:0][3:0],tmp5[3:0][3:0],tmp6[3:0][3:0],tmp7[3:0][3:0];  
      // BRAM matrix A  
      Matrix_A matrix_A_u (.clka(clk),.addra (addr),.douta(mat_A) );  
      // BRAM matrix B  
       ROM matrix_B_u(.clka(clk), .addra (addr),.douta(mat_B) );  
      always @(posedge clk or posedge reset)  
      begin  
           if(reset) begin  
                addr <= 0;  
           end  
           else  
           begin  
                if(addr<15)   
                addr <= addr + 1;  
                else  
                addr <= addr;  
                matrixA[addr/4][addr-(addr/4)*4] <= mat_A ;  
                matrixB[addr/4][addr-(addr/4)*4] <= mat_B ;  
           end  
      end  
      // yl315.net FPGA projects, Verilog projects, VHDL projects 
      genvar i,j,k;  
      generate  
      for(i=0;i<4;i=i+1) begin:gen1  
      for(j=0;j<4;j=j+1) begin:gen2  
           // fixed point multiplication  
           qmult #(8,16) mult_u1(.i_multiplicand(matrixA[i][0]),.i_multiplier(matrixB[0][j]),.o_result(tmp1[i][j]),.ovr(overflow1));  
           qmult #(8,16) mult_u2(.i_multiplicand(matrixA[i][1]),.i_multiplier(matrixB[1][j]),.o_result(tmp2[i][j]),.ovr(overflow2));  
           qmult #(8,16) mult_u3(.i_multiplicand(matrixA[i][2]),.i_multiplier(matrixB[2][j]),.o_result(tmp3[i][j]),.ovr(overflow3));  
           qmult #(8,16) mult_u4(.i_multiplicand(matrixA[i][3]),.i_multiplier(matrixB[3][j]),.o_result(tmp4[i][j]),.ovr(overflow4));  
           // fixed point addition  
           qadd #(8,16) Add_u1(.a(tmp1[i][j]),.b(tmp2[i][j]),.c(tmp5[i][j]));  
           qadd #(8,16) Add_u2(.a(tmp3[i][j]),.b(tmp4[i][j]),.c(tmp6[i][j]));  
           qadd #(8,16) Add_u3(.a(tmp5[i][j]),.b(tmp6[i][j]),.c(tmp7[i][j]));  
           //assign matrix_output[i][j]= tmp7[i][j];  
      end  
      end  
      endgenerate  
      // yl315.net FPGA projects, Verilog projects, VHDL projects 
      always @(posedge clk or posedge reset)  
      begin  
           if(reset) begin  
                address <= 0;  
                wen <= 0;  
                end  
           else begin  
                address <= address + 1;  
                if(address<16) begin  
                     wen <= 1;  
                     data_in <= tmp7[address/4][address-(address/4)*4];  
                end  
                else  
                begin  
                     wen <= 0;            
                end  
           end  
      end  
      matrix_out matrix_out_u(.clka(clk),.addra (address[3:0]),.douta(data_out),.wea(wen),.dina(data_in) );  
 endmodule  

TestBench Verilog代码:

`timescale 10ns / 1ps  
 module tb_top;  // yl315.net FPGA projects, Verilog projects, VHDL projects 
      // Inputs  
      reg clk;  
      reg reset;  
      integer i;  
      wire [15:0] data_out;  
      reg [15:0] matrix_out[15:0];  
      integer fd;   
      parameter INFILE = "result.dat";  
      // Instantiate the Unit Under Test (UUT)  
      matrix_multiplication uut (  
           .clk(clk),   
           .reset(reset),  
           .data_out(data_out)  
      );  
      initial begin  
           // Initialize Inputs  
           reset = 1;  
           clk <= 0;  
           // Wait 100 ns for global reset to finish  
           #100;  
           reset = 0;   
           for(i=0;i<32;i=i+1)  
           begin  
                #100 clk = ~clk;  
           end  
           #10000  
           reset = 1;  
           #1000  
           reset = 0;  
           for(i=0;i<32;i=i+1)  
           begin  
                #100 clk = ~clk;  
           end  
           for(i=0;i<64;i=i+1)  
           begin  
                #100 clk = ~clk;  
           end  
           clk = 0;  
           for(i=0;i<32;i=i+1)  
           begin  
                 #100 clk = ~clk;  
                 matrix_out[i/2] = data_out;  
           end                 
           #100;  
             for(i=0; i<16; i=i+1) begin  
                  $fwrite(fd, "%d", matrix_out[i][15:8]);  
                  $fwrite(fd, "%d", matrix_out[i][7:0]);  
                  #200;  
                end  
           end  
 // fpga4student.com FPGA projects, Verilog projects, VHDL projects
 // Writing the output result to result.dat file
    initial begin  
                fd = $fopen(INFILE, "wb+");  
           end  
 endmodule  

可综合固定点矩阵计算的Verilog代码,可以在FPGA上实现。模拟结果被写入结果.DAT文件,我们可以轻松检查文件中的结果。 

推荐的 Verilog projects:
2. FIFO记忆的Verilog代码
3. 16位单循环MIPS处理器的Verilog代码
4. Verilog HDL中的可编程数字延迟计时器
5. 数字电路中基本逻辑组件的Verilog代码
6. 32位无符号分频器的Verilog代码
7. 用于固定点矩阵乘法的Verilog代码
8. 在Verilog HDL中的板牌识别
9. 携带外观前方乘法器的Verilog代码
10。  微控制器的Verilog代码
11.  4x4乘法器的Verilog代码
12.  停车系统的Verilog代码
13。  使用Verilog HDL对FPGA的图像处理
14。  如何使用Verilog HDL将文本文件加载到FPGA中
15.  交通灯控制器的Verilog代码
16。  FPGA闹钟的Verilog代码
17。  比较器设计的Verilog代码
18。  VERILOG代码D触发器
19。  完整加法器的Verilog代码
20。  与测试台的计数器的Verilog代码
21。  16位RISC处理器的Verilog代码
22。  FPGA上的按钮的Verilog代码
23。  如何为双向/ inout端口编写Verilog TestBench
29。  多路复用器的Verilog代码
FPGA Verilog VHDL课程

1条评论:

  1. 嘿!代码是'T在Quartus Prime软件上合成,更重要的是,在ModelSIM中无法模拟此代码。你能给我确切的工作代码吗?或者可以帮助我解决我的错误?
    谢谢

    回复 删除

趋势FPGA项目