๐Ÿš€ We are excited to release the ๐— ๐—š-๐—ฉ๐—ฒ๐—ฟ๐—ถ๐—น๐—ผ๐—ด dataset for LLM-assisted Verilog code generation, as presented in our paper, โ€œ๐— ๐—š-๐—ฉ๐—ฒ๐—ฟ๐—ถ๐—น๐—ผ๐—ด: ๐— ๐˜‚๐—น๐˜๐—ถ-๐—ด๐—ฟ๐—ฎ๐—ถ๐—ป๐—ฒ๐—ฑ ๐——๐—ฎ๐˜๐—ฎ๐˜€๐—ฒ๐˜ ๐—ง๐—ผ๐˜„๐—ฎ๐—ฟ๐—ฑ๐˜€ ๐—˜๐—ป๐—ต๐—ฎ๐—ป๐—ฐ๐—ฒ๐—ฑ ๐—Ÿ๐—Ÿ๐— -๐—ฎ๐˜€๐˜€๐—ถ๐˜€๐˜๐—ฒ๐—ฑ ๐—ฉ๐—ฒ๐—ฟ๐—ถ๐—น๐—ผ๐—ด ๐—š๐—ฒ๐—ป๐—ฒ๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป,โ€ which won ๐˜๐—ต๐—ฒ ๐—•๐—ฒ๐˜€๐˜ ๐—ฃ๐—ฎ๐—ฝ๐—ฒ๐—ฟ ๐—”๐˜„๐—ฎ๐—ฟ๐—ฑ at the First IEEE International Workshop on LLM-Aided Design (LADโ€™24)! Please feel free to try our ready-to-use dataset at HuggingFace.

Motivation

The proposed MG-Verilog dataset aims to alleviate the scarcity of high-quality hardware datasets, which currently hinders the development of LLM-assisted hardware design.

Key Features

  • โœจ ๐— ๐˜‚๐—น๐˜๐—ถ-๐—ด๐—ฟ๐—ฎ๐—ถ๐—ป๐—ฒ๐—ฑ ๐—ฑ๐—ฒ๐˜€๐—ฐ๐—ฟ๐—ถ๐—ฝ๐˜๐—ถ๐—ผ๐—ป๐˜€ ๐—ณ๐—ผ๐—ฟ ๐—ฉ๐—ฒ๐—ฟ๐—ถ๐—น๐—ผ๐—ด ๐—ฐ๐—ผ๐—ฑ๐—ฒ ๐˜€๐—ฎ๐—บ๐—ฝ๐—น๐—ฒ๐˜€: The MG-Verilog dataset contains natural language descriptions of varying levels of granularity for each Verilog code sample. Inspired by the human learning process, MG-Verilog aims to teach LLMs more effectively through this balanced approach.
  • ๐Ÿ”ง ๐—”๐—ป ๐—ฎ๐˜‚๐˜๐—ผ๐—บ๐—ฎ๐˜๐—ฒ๐—ฑ ๐—ฑ๐—ฎ๐˜๐—ฎ๐˜€๐—ฒ๐˜ ๐—ด๐—ฒ๐—ป๐—ฒ๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—ณ๐—น๐—ผ๐˜„: To enable scalable and low-cost labeling of any Verilog code samples, we have designed an automated dataset generation flow. This allows users from various backgrounds to produce their own multi-grained datasets using their own data, similar to MG-Verilog.
  • ๐Ÿ“ˆ ๐—•๐—ฒ๐˜๐˜๐—ฒ๐—ฟ ๐—ณ๐—ถ๐—ป๐—ฒ-๐˜๐˜‚๐—ป๐—ฒ๐—ฑ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€ ๐˜„๐—ถ๐˜๐—ต ๐— ๐—š-๐—ฉ๐—ฒ๐—ฟ๐—ถ๐—น๐—ผ๐—ด: LLMs fine-tuned on the MG-Verilog dataset consistently demonstrate superior Verilog code generation capabilities compared to those fine-tuned on other baseline datasets, especially when handling instructions of varying granularity. This suggests that MG-Verilog can further enhance the user-friendliness of the Verilog code generation process.

๐—ช๐—ฒ ๐—ฝ๐—ฟ๐—ฒ๐—ฝ๐—ฎ๐—ฟ๐—ฒ๐—ฑ ๐—ฎ ๐˜€๐—ต๐—ผ๐—ฟ๐˜ ๐—ฑ๐—ฒ๐—บ๐—ผ๐—ป๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—ผ๐—ณ ๐—ผ๐˜‚๐—ฟ ๐—ณ๐—ฟ๐—ฎ๐—บ๐—ฒ๐˜„๐—ผ๐—ฟ๐—ธ:

For more technical details, please check out:
๐—ข๐˜‚๐—ฟ ๐—ฝ๐—ฎ๐—ฝ๐—ฒ๐—ฟ: https://arxiv.org/abs/2407.01910
๐—š๐—ถ๐˜๐—›๐˜‚๐—ฏ ๐—ฟ๐—ฒ๐—ฝ๐—ผ: https://github.com/GATECH-EIC/mg-verilog

preload imagepreload image