Reflections on the language of mathematics have direct practical significance in the era of large language models, and I will try to talk both about what mathematicians would like from these models and about the difficulties we may encounter on the path to realizing this dream.
Graduated from the Mechanics and Mathematics Faculty of Moscow State University in 1993. Postgraduate graduate of the Independent University of Moscow. Defended his Candidate of Sciences dissertation at Moscow State University in 1995. In 1996, he moved to the USA, where he worked as an associate professor at the University of California, Berkeley, and from 2002 to 2010 as a professor at Princeton University. In 2006, he received the Fields Medal for achievements connecting probability theory, representation theory, and algebraic geometry. Since 2010, he has been a professor at Columbia University, and in February 2014, he became one of the scientific directors of the international laboratory of representation theory and mathematical physics at the Faculty of Mathematics of the Higher School of Economics.
We will discuss why we chose mathematics, how we trained language models to find and explain solutions to problems in a methodologically correct way, and what unexpected challenges we encountered. We will discuss our experience using reinforcement learning (RL) to improve answer quality and explain how we built a dialog system with a cascade of models.
Participants will learn how we made product decisions based on experiments (SBS, A/B tests), what data insights we gained, what mistakes we made during the launch, and what we had to change after the first feedback from real schoolchildren and their parents.
On weekends, he goes hiking in the Southern Urals with his family.
More than fifty ML specialists, analysts, backend developers, and managers worked on the YandexGPT 5.1 release. It’s impossible to cover everything done in a reasonable time, so the talk will focus on two interesting tasks. First, I will tell you how we taught the model to better remember facts and apply knowledge about them. Second — how we finally achieved stable work of online-RL.
In this report, I will discuss research on adding a memory module to a chatbot, the main ways to extract information for memory, and how to store and use it in a dialogue.
We will examine how SberAI implemented adding memory to GigaChat, what problems they encountered, and how they solved them.
In this report, I will share how we use synthetic data. On one hand, it’s the generation of instructional examples for the general domain. On the other hand, it’s the creation of domain-specific datasets for internal tasks. This approach helps compensate for the lack of real data and improve model quality.
We will discuss the types of synthetic data we create, how we build generation and filtering pipelines, what metrics we use to assess their usefulness, and when synthetic data truly helps. We will separately discuss case studies from T-Bank’s practice — adapting models for specific internal scenarios.
Secondly, VLMs help analyze product listings. For example, they help eliminate data inconsistencies that degrade matching quality. We developed a pipeline with task-specific fine-tuned VLM and GPT models. It identifies and filters products with inconsistent content.
Now we are seeing many works related to combining discriminative and generative modeling within a single architecture. We will discuss how practically justified this is and whether such models will become the new dominant paradigm.
We will talk about pretraining and reinforcement learning (RL), and also discuss how to proceed when there is no open SOTA in your field. Participants will learn the difference between open loop and closed loop, and how a prediction task differs from a motion planning task.
The story will cover the journey of Yandex Autonomous Transport team: from the first ML experiments with generative neural networks to regular trials of the autopilot in real cars.
Joined Yandex Autonomous Transport in 2021 to develop the motion planning engine.
Interested in deep learning (DL) and foundation models in robotics. Enjoys outdoor activities and badminton, playing computer games, and cooking delicious food.
I will separately discuss one of the implementation approaches that allowed us to realize the solution with minimal resources and without a major infrastructure overhaul.
For a voice assistant to be able to engage in dialogue on a trending topic, it needs to learn about new terms at least a few weeks before their peak popularity. In other words, trends and memes have to be predicted. We will explain how to automatically extract them from the Alice query stream and manage to prepare the product for relevant topics in time.
In my talk, I will share our experience adapting Argus to various products at Yandex. I will discuss how the architecture and training process have evolved, where we managed to significantly improve quality, and where we simplified the model. Furthermore, I will present the latest results of adapting Argus for a single-stage usage scenario.
In this report I will explain how we managed to combine data from different services to obtain sequences of customer actions. I will share how using transformers on these sequences helped improve the customer experience. We managed to increase not only the quality metrics of ML models in classification, regression, candidate generation, and ranking tasks but also business metrics in the services.
Currently works on making recommendations real-time and implementing transformer-based personalization for various tasks. In his free time, he enjoys swimming and takes long walks.
In this report, we will detail the dataset pipeline: audio normalization, speech and noise separation, diarization, segmentation, automatic quality filtering, and transcription. We will separately show how we solved problems encountered during data collection.
We will demonstrate the practical value of the corpus through experiments with the F5-TTS model for Russian, which is also available for download.
In his free time, he plays Dungeons & Dragons and enjoys diving into mosh pits at concerts.
We will examine Yandex Music’s approach to ranking tasks: why add early binding to transformers and how to implement it efficiently. I will share details of the model architecture, intricacies of the training pipeline, specifics of the production inference infrastructure, and my own insights.
This talk will be useful for ML engineers working with high-load recommender systems.
We will examine how the architecture from the paper «„Towards Universal Sequence Representation Learning for Recommender Systems“» was adapted to industrial requirements, which improvements provided the greatest quality boost, and how these solutions are integrated into a large-scale production infrastructure.
Special attention will be paid to how a unified embedding space can provide personalization in tasks where the separability of behavior clusters is important.
Throughout her career, she has worked on NLP, LLM, and multimodal tasks, including creating generative assistants at SberDevices, code generation, and integrating models into high-load business products. Currently leads the LLM R&D direction at Wildberries and, together with her team, develops infrastructure for integrating user behavior using LLMs.
Writes articles based on research results, including on adapting architectures to unsolved problems in the field of multi-agent systems. Creates pet projects; recent ones include an agent for finding leaks in a codebase, a library for multimodal LLM training, and a Telegram bot for personalized news distribution. In her free time, she is interested in history and economics, works out at the gym, travels, and reads books.
Special focus will be given to MLOps and DataOps questions: — how we built a data collection and annotation pipeline involving app users; — what approaches we used for incremental model training and accelerated iteration; — how we organized quality monitoring (precision, recall, F1-score) and tracking metrics over time; — what helped scaling the system to millions of users while maintaining high performance on mobile devices.
We will honestly share how we dealt with rare classes and optimized the pipeline for mobile devices. We will share MLOps solutions for scalable retraining and model maintenance in the face of constantly changing data.
This limitation isn’t a problem for simple tasks but becomes a serious barrier when it comes to multiple coordinated generations. For instance, creating long videos or narrative stories.
In this workshop, we will examine which approaches can be useful and how they are implemented in open source and on closed platforms. Furthermore — using open-source models — we will try to build our own pipeline for generating a narrative video.
Loves traveling, contemporary art, and outdoor activities.
In the workshop, I will analyze in detail a project on implementing LLM hints for annotators. I will share unexpected findings and several aspects of the work that turned out to be more difficult than we initially thought. We will talk about how to write prompts correctly, create an optimal pipeline, and measure the impact of implementation.
A mathematician, spent about 15 years deeply involved in game theory, earned a PhD, and wrote several good papers. Plays Renju and Gomoku professionally. For many years, he ran math circles; among his former students are about 10 international olympiad medalists, and employees of Yandex, Google, and other good companies.
In this workshop, we will together examine the code of a recommender system developed by an intern, identify potential errors, and discuss ways to prevent them.
All the tasks we have prepared are based on real events from practice.
Enjoys squash, swimming, loves reading, and hiking with a backpack.
Fixed benchmarks with multiple-choice answers, arenas, strict side-by-side expert evaluations, and online product metrics — which of these comes closer to the truth? What are the advantages of each approach? What challenges and risks do they involve?
The goal of the discussion is to draw a clear line between the convenience of measurement and genuine user value.
In the industry, the question is increasingly being asked: do scaling laws actually work for recommender systems, or is it an expensive illusion? Is it really true that “bigger means better,” or is it sometimes wiser to focus not on model size but on data quality? And if scaling is indeed effective, why are models like LightFM and Wide&Deep still widely used in production?
Another topic is hybrid models. During the discussion, we’ll explore whether they can be considered a “golden mean” or rather a compromise without real gains. We’ll compare how different approaches perform — from a classical recommender system built on CatBoost to a modern two-tower neural network with a transformer.
We’ll also discuss whether transformers are truly necessary when building a recommender system from scratch. Do they become overkill for startups, and are they suitable for products where most users remain “cold”? Finally, we’ll examine whether real-time inference can help in such cases.
AI agents are already being used for internal assistants, service consoles, and support tools. We’ll discuss how to build your own agent (open-source frameworks, cloud services), give it access to the web and voice, and integrate it into your infrastructure (MCP, function calling, event buses, access control). We’ll also look at what tasks can already be delegated to AI today.
A separate block will focus on security and operations: reliability goals, observability, data control, access perimeters, activity auditing, and rights and budget limitations.
We’ll review real-world cases from Yandex Cloud, Sber, Avito, and MWS AI. The discussion will cover developer assistants, support automation, content and data processing, and integration with internal APIs and legacy systems. We’ll also examine bottlenecks such as latency, cost, tool management, and model versioning — and how to build a platform-based approach that allows rapid experimentation without sacrificing reliability.
AI agents are one of the hottest topics in the industry today. E-commerce and related fields are already actively experimenting with them — using agents to attract new users, optimize costs, and free up employees’ time for more meaningful work. But there’s a huge gap between a quick prototype and a stable production system.
Along with new opportunities come new challenges: how to build a reliable architecture while still adopting innovations quickly, how to combine an LLM core with a traditional ML pipeline, and how to strike a balance between creativity and predictability.
In this discussion, we’ll explore what helps tackle these challenges in practice — and where the industry is headed next.