Lixuan Guo (郭立轩)
email: glixuan6 _at_ gmail.com

| CV | Google Scholar | Github | Email | Wechat |

Hello there! I am Lixuan, a third-year CS Undergraduate student at Xidian University. Currently, I am a Research Intern at Stony Brook University, advised by Chenyu You and work closely with Yifei Wang and Stefanie Jegelka to explore Sparsity's efficient and scalable applications in various fields, such as embedding-based retrieval, LLM finetuning and MoE architecture design. Previously, I cooperate with Shixiong Zhang on projects focusing on single-cell-RNA sequence clustering.

My current research interest broadly lies in Computer Vision, Natural Language Processing and Machine Learning. I am dedicated to the development of a universal, efficient and reliable machine learning system that can handle images, text, and multimodal tasks.

  News
  • [10/2025] CSRv2 is out. Let's explore ultra sparsity together!
  Publications
sym

CSRv2: Unlocking Ultra-Sparse Embeddings
Lixuan Guo*, Yifei Wang*, Tiansheng Wen*, Yifan Wang, Aosong Feng, Bo Chen, Stefanie Jegelka, Chenyu You (* Equal contribution)
ICLR(Under review), 2025

abstract | paper | code |

In the era of large foundation models, the quality of embeddings has become a central determinant of downstream task performance and overall system capability. Yet widely used dense embeddings are often extremely high-dimensional (e.g., 4096), incurring substantial costs in storage, memory, and inference latency. To address these, Contrastive Sparse Representation (CSR) is recently proposed as a promising direction, mapping dense embeddings into high-dimensional but $k$-sparse vectors, in contrast to compact dense embeddings such as Matryoshka Representation Learning (MRL). Despite its promise, CSR suffers severe degradation in the ultra-sparse regime (e.g., $k \leq 4$), where over 80\% of neurons remain inactive, leaving much of its efficiency potential unrealized. In this paper, we introduce \textbf{CSRv2}, a principled training approach designed to make ultra-sparse embeddings viable. CSRv2 stabilizes sparsity learning through progressive $k$-annealing, enhances representational quality via supervised contrastive objectives, and ensures end-to-end adaptability with full backbone finetuning. CSRv2 reduces dead neurons from 80\% to 20\% and delivers a 14\% accuracy gain at $k=2$, bringing ultra-sparse embeddings on par with CSR at $k=8$ and MRL at 32 dimensions, \textit{all with only two active features}. While maintaining comparable performance, CSRv2 delivers a {7$\times$ speedup over MRL}, and yields up to \textbf{300$\times$ improvements in compute and memory efficiency} relative to dense embeddings. Extensive experiments across text (MTEB, multiple state-of-the-art LLM embeddings (Qwen and e5-Mistral-7B)) and vision (ImageNet-1k) demonstrate that CSRv2 makes ultra-sparse embeddings practical without compromising performance. By making extreme sparsity viable, CSRv2 broadens the design space for large-scale, real-time, and edge-deployable AI systems where both embedding quality and efficiency are critical.

  Projects
sym

scZGCL: Deep Single-cell RNA-seq Data Clustering with ZINB-based Graph Contrastive Learning A PyTorch implemented framework for single-cell RNA-seq data clustering via graph contrastive learning.

introduction | code |

Cell clustering is crucial for analyzing single-cell RNA sequencing (scRNA-seq) data, allowing us to identify and differentiate various cell types and uncover their similarities and differences. Despite its importance, clustering scRNA-seq data poses significant challenges due to its high dimensionality, sparsity, dropout events caused by sequencing limitations, and complex noise patterns. To address these issues, we introduce a new clustering method called single-cell ZINB-based Graph Contrast Learning (scZGCL). This method employs an unsupervised deep embedding clustering algorithm. During the pre-training phase, our method utilizes an autoencoder based on the Zero-Inflated Negative Binomial Distribution (ZINB) model, learns cell relationship weights through a graph-attentive neural network, and introduces contrast learning to bring similar cells closer together. In the fine-tuning phase, scZGCL refines the clustering results by optimizing a loss function using Kullback-Leibler (KL) divergence, enhancing the accuracy of cell classification into distinct clusters. Comprehensive experiments on 12 scRNA-seq datasets demonstrate that scZGCL outperforms state-of-the-art clustering methods.

  Professional Activity
  • Journal Reviewer: Pattern Recognition, TNNLS
  Selected Honors & Awards
  • [05/2025] International Collegiate Programming Contest (ICPC) Provincial Bronze Award
  • [02/2025] US College Mathematics Modeling Competition (MCM) Finalist Award (Top 1%).
  • [11/2024] National Mathematical Competition for College Students Provincial First Prize.
  • [07/2024] Awarded Second-Class Undergraduate Scholarship in Xidian University.
  • [02/2024] US College Interdisciplinary Contest in Modeling (ICM) Meritorious Award.

Template from this awesome website.