Over the past 12 years, the "resolution revolution" in cryo-electron microscopy (cryo-EM) has transformed the landscape of structural biology, enabling the determination of high-resolution structures for many biomolecules and macromolecular complexes that were previously difficult to resolve. However, the potential of cryo-EM extends far beyond the traditional goal of determining structures of known biomolecules. Over the past three years, the team led by Nieng Yan has explored a broader question:
As the highest-resolution imaging technology currently available, what else can cryo-EM achieve? Can cryo-EM be used as a discovery tool to explore the deep sea, deep underground, and even deep space, enabling the discovery of new species and novel forms of matter?
To address these questions, the team initiated a new structure-first research paradigm termed CryoSeek, in which high-resolution three-dimensional structures serve as the starting point for discovery. As a proof of concept, before extending CryoSeek to deep-sea and other extreme environments, the researchers first explored the famous lotus pond at Tsinghua University (Tsinghua Lotus Pond), which is well known for the prose "Moonlight over the Lotus Pond", and unexpectedly resolved a large number of glycan structures, filling a major gap in glycan structural biology. This project was therefore named the "Lotus Glycan Moonlight" initiative.
Glycans are one of the four fundamental classes of biological macromolecules and constitute the most abundant biological organic matter on Earth in terms of biomass. In addition to serving as major energy sources and structural components of cells, glycans participate in numerous essential biological processes, including protein folding, cell recognition, and immune responses. However, compared with nucleic acids and proteins, structural studies of glycans have progressed much more slowly. This is largely due to the extreme complexity of glycans in terms of monosaccharide composition, branching patterns, stereochemical configurations, and structural flexibility, all of which make three-dimensional structural determination highly challenging. As a result, glycans are often referred to as the "dark matter" of life sciences. The lack of high-resolution structural information not only limits our understanding of glycan functions and mechanisms, but also restricts the development of AI-based structural prediction and design.
In previous studies, the Nieng Yan team combined cryo-EM analysis, AI-assisted automated model building, and bioinformatics approaches to report proteinaceous firbirl structures termed TLP-1s from environmental samples collected from the Tsinghua Lotus Pond, and further proposed its potential origin and biological function. Subsequent work identified another novel glycofibril structure, TLP-4, which consists of a linear polypeptide core composed of tetrapeptide repeats surrounded by extensive glycan chains. The tetrapeptide repeat contains a conserved 3,4-dihydroxyproline (DiHyp) residue, with both the 3-OH and 4-OH positions glycosylated. Adjacent to the DiHyp residue is a conserved O-glycosylated serine or threonine residue. These findings not only revealed the important role of glycans in biomolecular assembly, but also demonstrated that the structure-first CryoSeek strategy provides a new paradigm for the discovery and structural characterization of biological “dark matter,” including natural glycans [1,2].
On April 20, 2026 (Beijing time), Nieng Yan and collaborators published their latest study entitled "CryoSeek identification of glycofibrils with diverse compositions and structural assemblies" in Cell Chemical Biology (Figure 1). Continuing the "Lotus Glycan Moonlight" project, the researchers reported six previously unknown glycofibril structures, revealing the remarkable diversity of glycofibrils in natural environments and their critical roles in structural assembly.

Figure 1. Paper's first page
The six newly resolved glycofibrils were named TLP-IPT, TLP-12, TLP-4b, TLP-3, TLP-2, and TLP-0 (Figure 2). The prefix “TLP” refers to the Tsinghua Lotus Pond, while the suffixes reflect structural characteristics of the protein cores within the fibrils.
TLP-IPT contains a recognizable protein core composed of tandem IPT (Ig-like/plexins/transcription factors) domains. Each IPT domain is surrounded by 13 glycan chains, and cross-sectional views reveal an overall "C"-shaped conformation. Glycan chains from neighboring IPT domains complement the open side of the "C"-shape, collectively forming a complete ring-like "O"-shaped assembly.
TLP-12 consists of three highly repetitive dodecapeptide chains woven into a parallel three-stranded β-sheet ribbon. Two helically arranged columns of glycan chains surround the protein core, resulting in an overall helical architecture.

Figure 2. Identification of various filamentous structures in the Tsinghua Lotus Pond using the CryoSeek strategy. (A) Currently, nine high-resolution filamentous structures have been resolved from water samples in the Tsinghua Lotus Pond. Based on their protein composition, they are classified into three types: protein fibrils (TLP-1a/b), protein-core glycofibrils (TLP-2/3/4a/4b/12/IPT), and glycan-only fibril (TLP-0). (B–F) Composition and helical parameters of the five newly reported glycofilament structures.
TLP-4b, TLP-3, and TLP-2 share a common structural feature in which linear polypeptide cores are enveloped by glycan layers. TLP-3 contains three intertwined tripeptide-repeat chains forming a thin protein filament core, whereas the cores of TLP-4b and TLP-2 each consist of a single linear polypeptide chain.
The core of TLP-4b is composed of tetrapeptide repeats and is molecularly similar to the previously reported TLP-4. Both structures contain a conserved 3,4-dihydroxyproline residue, an O-glycosylated serine or threonine residue, and two additional non-conserved amino acids. Despite the similarity of their protein cores, the attached glycan chains differ substantially in composition and branching patterns. Accordingly, the previously reported TLP-4 was renamed TLP-4a, while the newly resolved structure was designated TLP-4b.
In contrast, the core of TLP-2 consists of dipeptide repeats. Based on cryo-EM density features, the researchers proposed that its glycosylation pattern may involve phosphoglycosylation.
Particularly notable is TLP-0, a fibril composed entirely of glycans, with “0” indicating the absence of any protein component. The core of TLP-0 consists of trisaccharide repeating units arranged helically, with each repeat further linked to approximately 18 additional sugar molecules. These glycan chains are essential for structural stability and assembly. Both the molecular composition and assembly mode of TLP-0 differ substantially from those of classical glycans such as cellulose, starch, and glycogen, highlighting the remarkable diversity of glycan architectures and demonstrating that glycans can spontaneously assemble into highly ordered supramolecular structures without requiring protein scaffolds (Figure 3).

Figure 3. Molecular composition and assembly form of TLP-0.
These findings reveal the extensive diversity of glycofibrils in natural environments and further demonstrate that the structure-first CryoSeek paradigm, combined with multidisciplinary approaches, may provide a new route for the discovery, high-throughput structural characterization, and systematic investigation of biological “dark matter,” including glycans. Building upon the “Lotus Glycan Moonlight” initiative, the CryoSeek platform has established a standardized, high-throughput workflow that can be extended to underexplored biological environments such as the deep sea, deep underground, and deep space, potentially enabling deeper insights into the living world.
Nieng Yan, Founding President of Shenzhen Medical Academy of Research and Translation (SMART) and Director of Shenzhen Bay Laboratory; Assistant Researcher Zhangqiang Li from the School of Life Sciences, Tsinghua University; and Tongtong Wang, are the co-corresponding authors of the study. Zhangqiang Li, Tongtong Wang, and Yitong Sun, share co-first authorship.
Postdoctoral researchers Kui Xu and Wenze Huang, Associate Professors Qiangfeng Zhang and Chuangye Yan from the School of Life Sciences, Tsinghua University, and Junior Principal Investigator Mingxu Hu from SMART also made important contributions to this work.
Cryo-EM data collection was supported by the Cryo-EM Platform of Tsinghua University, while computational analyses were supported by the High-Performance Computing Platform of Tsinghua University and the National Protein Science Facility Experimental Technology Center (Beijing). This research was supported by the Major Research Program of the National Natural Science Foundation of China, the Beijing Frontier Research Center for Biological Structure, and the Tsinghua-Peking Joint Center for Life Sciences. The team also expressed special thanks for support provided through the “Mindray Professorship” funded by the Pengrui Foundation.
The glycofibril structures and corresponding cryo-EM density maps reported in this study have been deposited in the CryoSeek Database (https://cryoseek.org.cn/). Established by SMART under the leadership of Nieng Yan, the CryoSeek Database systematically archives structural and identification data related to biological “dark matter,” particularly glycans, and provides online services to facilitate data management and sharing within the research community. Related studies on the CryoSeek database and the high-throughput CryoSeek structural determination platform have also been released on the LTS Preprint Server (LTSpreprints) [3].
Notably, the present study cited two previously released preprints posted on the LTS Preprint Server (Ltspreprints.org):
High-throughput cryo-EM characterization and automated model building of glycofibrils via CryoSeek
AI-facilitated high-resolution cryo-EM analyses of tubular mastigonemes reveal the structural roles of N- and O-glycans
These preprints were made publicly available through the LTS Preprint Server prior to formal publication, providing important references for subsequent studies and further highlighting the value of preprints in scientific communication. The work also demonstrates that preprints can be formally cited before peer-reviewed publication. At present, articles posted on the LTS Preprint Server are increasingly being indexed by major academic platforms such as Google Scholar and ResearchGate, reflecting their growing academic impact.

Figure 4. Citation of a LTSpreprints article.
Reference:
[1] Wang, T., Li, Z., Xu, K., Huang, W., Huang, G., Zhang, Q. C., & Yan, N. (2024). CryoSeek: A strategy for bioentity discovery using cryoelectron microscopy. Proceedings of the National Academy of Sciences, 121(42), e2417046121.
[2] Wang, T., Huang, W., Xu, K., Sun, Y., Zhang, Q.C., Yan, C., Li, Z., & Yan, N. (2025). CryoSeek II: Cryo-EM analysis of glycofibrils from freshwater reveals well-structured glycans coating linear tetrapeptide repeats. Proceedings of the National Academy of Sciences, 122(1), p.e2423943122.
[3] Hu, M., Chen, S., Wang, T., Qin, L., Zhang, Q., Zhang, Y., Ge, Q., Chen, T., Li, M., Li, C., Xu, G., Gui, Q., Li, Z., & Yan, N. (2025). High-throughput cryo-EM characterization and automated model building of glycofibrils via CryoSeek. LangTaoSha Preprint Server. https://doi.org/10.65215/bkvrt910
Translation: Jianming JIA
Subscription successful! Thank you for following SMART.