PNAS | Nieng Yan’s Team Identifies Novel Glycofibrils Using CryoSeek Strategy
2025-01-02 66

Carbohydrates (also known as glycans), one of the four major classes of biological macromolecules, not only provide structural support and serve as a source of metabolic energy for cells, but also play critical roles in diverse biological processes, including assisting protein folding and mediating intercellular recognition and immune responses. When present as oligo- or polysaccharides, glycans can covalently attach to polypeptides, lipids, or RNA molecules to form "glycoconjugates". Compared to the more extensively studied DNA, RNA, and proteins, glycans are equally functionally significant but far more challenging to investigate due to the complexity of monosaccharide types, branching patterns, spatial configurations, and structural flexibility, all of which complicate the analysis of their three-dimensional structures. Consequently, our knowledge of glycan three-dimensional structures at high resolution remains limited, which not only hinders the understanding of their biological functions and underlying mechanisms but also impedes the advancement of AI-facilitated structure prediction and design in this field. 

In March 2024, Nieng Yan's team reported the high-resolution structure of the Chlamydomonas mastigoneme, which for the first time revealed the crucial role of arabino-glycans in the high-order assembly of bio-architecture[1] . Subsequently, by pushing the resolution further to 2.3Å, they identified an unprecedented 5',5'-phosphodiester bond in nature that forms covalent cross-links between adjacent glycan chains. This work not only reveals a novel secondary structural element for glycans but also underscores the critical importance of achieving higher resolution in glycan structures[2] . 

Concurrently, Nieng Yan's team launched an initiative called "Glycans from the Lotus Pond", which employs the "CryoSeek" strategy to identify previously unknown biological macromolecules through cryo-electron microscopy[3]. On December 15, 2024, the team posted a preprint on BioRxiv presenting their latest findings using the CryoSeek approach, entitled '"he 8-nm spaghetti: well-structured glycans coating linear tetrapeptide repeats discovered from freshwater with CryoSeek"[4]. The study identifies a novel 8-nm-diameter glycofibril, TLP-4b, in water samples from Tsinghua lotus pond. Its core comprises a linear chain of tetrapeptide repeats, coated by a dense layer of glycans. Each tetrapeptide repeat consists of a conserved 3,4-dihydroxyproline (DiHyp) with 3-OH and 4-OH being highly O-glycosylated, as well as an adjacent conserved O-glycosylated serine or threonine. In a press release, the team explained the inspiration behind the name of "lycans from the Lotus Pond" and humorously referred to the glycofibril as "8-nm spaghetti" (BioRxiv: Nieng Yan's team discovers novel glycofibrils using CryoSeek strategy). In addition, they announced that another companion paper of this study will be published in the near future.

undefined

Link to full text: https://www.pnas.org/doi/epub/10.1073/pnas.2423943122

On January 1, 2025 (Beijing Standard Time), this anticipated paper was officially published online in Proceedings of the National Academy of Sciences (PNAS), entitled "CryoSeek II: Cryo-EM analysis of glycofibrils from freshwater reveals well-structured glycans coating linear tetrapeptide repeats" (Fig. 1). Notably, this study highlights that when AI-facilitated autobuilding tools failed to generate any structural model, the team leveraged their extensive expertise to identify the linear polypeptide core. Through detailed analysis and guided by the key insight that DiHyp is the sole residue with two chains being glycosylated, they manually built the atomic model of TLP-4a. The discovery of such structures is expected to lay a foundation for the development of new methods in glycan structure prediction.

undefined

Fig. 1 Structure of glycofibril TLP-4a in water samples from Tsinghua lotus pond

This study also reports the results of bioinformatic analysis, revealing that similar tetrapeptide repeats are widely distributed across diverse organisms. Following the acceptance of this paper, the structure of another glycofibril, TLP-4b, posted on BioRxiv, promptly corroborated this finding. Consequently, the team has designated the originally discovered TLP-4 as TLP-4a. Despite sharing a similar central linear polypeptide chain, TLP-4a and TLP-4b exhibit entirely distinct glycan composition and structure. This raises several interesting questions regarding the identity of the enzymes involved and the mechanism underlying substrate specificity. Although TLP-4a and TLP-4b differ significantly in the pattern and number of glycan branches (Fig. 2), a fundamental principle remains unchanged: the assembly and formation of the complete glycofibril consistently rely on the orderly arrangement and interaction of glycan structures (Fig. 3). 

undefined

Fig. 2 Comparison of glycan branching patterns and structures between TLP-4a and TLP-4b

undefined

Fig. 3 TLP-4a structure exclusively maintained by glycan-chain interactions

The CryoSeek strategy employed in this study, combined with subsequent structural and bioinformatic analyses, establishes a novel approach for isolating glycofibrils from natural sources and characterizing their glycan structures. Utilizing these discovered fibrous structures as models to investigate other potential glycofibrils and ultimately to establish more suitable biological models will be crucial for advancing our understanding of polysaccharide synthesis pathways, folding codes, and biological functions.

This study was led by co-corresponding authors Nieng Yan (Chair Professor at Tsinghua University; Researcher, Beijing Frontier Research Center for Biological Structure; Founding Dean, Shenzhen Medical Academy of Research and Translation; Director, Shenzhen Bay Laboratory), Zhangqiang Li (Research Assistant, School of Life Sciences, Tsinghua University), and Chuangye Yan (Associate Professor, School of Life Sciences, Tsinghua University). Tongtong Wang (Direct-entry PhD Candidate, Class of 2020) and Wenze Huang (Postdoctoral Fellow) from the School of Life Sciences at Tsinghua University are the co-first authors. Other contributors include Kui Xu (Postdoctoral Fellow) and Yitong Sun (Direct-entry PhD Candidate, Class of 2022) from the School of Life Sciences at Tsinghua University. Qiangfeng Zhang (Associate Professor, School of Life Sciences, Tsinghua University) assisted with data analysis. Cryo-electron microscopy data were collected with support from the cryo-electron microscope platform of Tsinghua University. Mass spectrometric analysis was performed with support from the protein chemistry and proteomics platform. Computational work was supported by the high-performance computing platform of Tsinghua University and the China National Center for Protein Sciences Beijing. This work was funded by the Major Research Plan of National Natural Science Foundation of China, Beijing Frontier Research Center for Biological Structure, and Tsinghua University-Peking University Joint Center for Life Sciences. 

   

References:

[1] Huang J, Tao H, Chen J, et al. Structure-guided discovery of protein and glycan components in native mastigonemes[J]. Cell, 2024, 187(7): 1733-1744. e12.

[2] Huang J, Tao H, Chen J, et al. High-resolution mastigoneme structure reveals 5', 5'-phosphodiesters stabilized glycan folding[J]. bioRxiv, 2024: 2024.12. 24.630281.

[3] Wang, T., Li, Z., Xu, K., Huang, W., Huang, G., Zhang, Q. C., & Yan, N. (2024). CryoSeek: A strategy for bioentity discovery using cryoelectron microscopy. Proceedings of the National Academy of Sciences, 121(42), e2417046121.

[4] Wang T, Sun Y, Li Z, et al. The 8-nm spaghetti: well-structured glycans coating linear tetrapeptide repeats discovered from freshwater with CryoSeek[J]. bioRxiv, 2024: 2024.12. 15.627649.


The current findings have merely scratched the surface, and numerous biological questions underlying these observation remain to be systematically addressed in future studies.