Jiayi Tian
+1 (805) 245 0298
|
jiayi_tian@ucsb.edu
|
github.com/ttttttris
|
linkedin.com/in/jiayi-tian-32b9652a5/
Focus on efficient training and inference of LLMs (Tensor decomposition, Pruning, Quantization, Knowledge Distillation)
EDUCATION
University of California, Santa Barbara, Ph.D. in Computer Engineering | CA, USA 3.9/4.0
Fall 2023 - ongoing
Nanjing University, B.Eng. in VLSI Design & System Integration | China 4.5/5.0
Fall 2019 - Jun 2023
INDUSTRIAL EXPERIENCE
Intel Corporation, Research Intern | Portland, OR
June. 2024 - Sep. 2024
• Proposed a tensor-compressed LLM training accelerator using FPGA with optimized compute ordering, dataflow,
and memory allocation.
• Achieved up to 48× memory efficiency and 3.6× energy efficiency compared to Nvidia RTX 3090.
• Resulting paper under review at IEEE TCAD.
AMD-Xilinx Technology, Co-Op/Intern | Beijing, China
June 2023 - Sep 2023
• Developed a C++/HLS Transformer training framework with custom tensorized linear layers and nonlinear oper-
ations for LLM acceleration.
• Achieved 30× ∼ 52× saving in model size for end-to-end Transformer training.
SKILLS & RESEARCH INTERESTS
Languages & Tools
Python, PyTorch, TensorFlow, Huggingface, C/C++, High-level Synthesis (HLS), Vivado/Vitis/XRT
ML & NLP
Large Language Models (LLMs), Efficient Training/Inference Speedup (Model Compression, Pruning,
SVD/Tensor-decomposition, Distillation, Quantization)
PUBLICATIONS & PREPRINTS
BEBERT: Efficient and robust binary ensemble BERT
Tian, Jiayi, Chao Fang, Haonan Wang, and Zhongfeng Wang, ICASSP 2023-2023 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP). IEEE, 2023.
Ultra Memory-Efficient On-FPGA Training of Transformers via Tensor-Compressed Optimization
Jiayi Tian, Jinming Lu, Hai Li, Xiangwei Wang, Cong (Callie) Hao, Ian Young, Zheng Zhang, under review at IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems. arXiv preprint arXiv:2501.06663.
FETTA: Flexible and Efficient Hardware Accelerator for Tensorized Neural Network Training
Jinming Lu, Jiayi Tian, Hai Li, Ian Young, Zheng Zhang, under review at IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems.
RESEARCH PROJECTS
Structural Pruning for Efficient LLM Inference leveraging Tensor Decomposition
Aug. 2024 - Current
• Explored structural pruning of LLM that leveraging the low-rankness of data to prune model weights.
• Proposed head-wise SVD, joint PCA, and Nyström methods across decoder modules.
• Demonstrated superior results on LLaMA and other large-scale LLMs over existing pruning baselines.
Training Accelerator Design for Tensor-Compressed Transformer Models
Sep. 2023 - Dec. 2024
• Designed a tensor-decomposition-based training scheme that reduces parameter count by 30× ∼ 52×.
• Introduced bidirectional tensor contraction to enhance memory and compute efficiency, especially in long-sequence
training and inference.
• Built an HLS-based Transformer training engine achieving up to 48× memory efficiency and 3.6× energy efficiency
compared with Nvidia RTX 3090.
Binary-Quantized Ensemble LLM for Fast and Robust Language Model Inference
Apr. 2021 - June. 2023
• Developed BEBERT, a novel quantization-ensemble strategy enabling efficient and accurate 1-bit BERT inference.
• Leveraged efficient knowledge distillation strategy for high training efficiency.
• Achieved 13× model size reduction and 15× compute savings over standard BERT with minimal accuracy loss.
• Proposed early-exit inference variant, further cutting compute by 20% ∼ 40% on GLUE benchmark.