Michael Feil on Developing Long-Context Language Models

The field of Artificial Intelligence (AI) is evolving at a breakneck speed, with technology being applied in new ways and undiscovered uses for it being found all the time. One such innovator is Michael Feil, whose work in AI concentrates on the benefits of long-context language models and furthering technological applications in surprising and profound ways.

Feil earned his Bachelor of Science degree in Mechatronics and Information Technology at the Karlsruhe Institute of Technology in Germany. During his studies, Feil participated in a research stay at Bosch and even had a hand in the subsequent publication of that research. According to Feil, this is where he first realized his passion for exploring the vast, untapped potential of AI and Machine Learning.

“I initially discovered Machine Learning and AI in 2019 at an internship at Bosch Research in Chicago,” he explains. “I was working on ML Models, Sensors, and IoT (Internet Of Things) and looked into performing Anomaly Detection for CNC (Computer Numerical Control). It ultimately resulted in a new patented sensor system and publications. This motivated me to do a post-graduate in Robotics and AI at TU Munich.”

While in Munich, Feil was involved in research stays at the Max-Planck-Institute for Intelligence Systems and at the Technical University of Denmark.

An Extensive Background in ML

Dedication to learning, extensive research, and hands-on experience have resulted in Feil becoming an expert in numerous areas surrounding the field of AI. He has deep knowledge of Language Models, the Generative AI Ecosystem in Open Source, and Machine Learning. Early in his career, Feil also worked in Reinforcement Learning, a sub-field of machine learning, in labs like the Max-Plank-Institute for Intelligent Systems in Tübingen, Germany, which is the most cited lab for computer science in the country. He also worked at Bosch Research and Bosch Rexroth to bring AI technology into manufacturing and worked with Applied AI in a research lab at Rohde & Schwarz in Munich.

At Rohde & Schwarz, Feil worked in the AI Incubation Team. We hired four interns and added a budget for two full-time employees transitioning to the project. I am excited about the AI talent pool in Munich. For example, I hired Oleksandr Pokras, who later founded stormy.ai, a YCombinator-backed startup in San Francisco.”

ChatGPT as a Springboard to Higher Heights

For Feil, working in LLMs right when ChatGPT was first released was a pivotal moment. “My luck at the time was that in December 2022, the infrastructure for running these models was far less advanced than it is today. I contributed to LLM inference projects like CTranslate2, where my work got featured as part of presenting Starcoder-1 by Huggingface and ServiceNow. Starcoder-1 was the best-performing Open Code LLM at the time of its release.”

This meant that Feil’s work was being used by prestigious companies like AWS, Databricks, Google Cloud, and Anyscale. As a result, many people tried to hire him, but he ultimately became an early employee of Gradient.ai, a San Francisco-based AI tech company that builds AI agents for enterprises.While at Gradient, Feil worked as the leading machine learning engineer. He innovated within the field by working on the first series of Open Source Models that exceeded 1M Context Length. As Feil says, “Research has shown that by increasing the number of input tokens that the LLM can use during generation, new capabilities can be achieved. One possible example was providing 402-page transcripts from Apollo 11’s mission to the moon; it can explain conversations, events, and details found across the document.”

Feil’s second significant turning point came with the recognition of a failure: many LLM projects from 2022 and 2023 were either not production-ready or offered a poor developer experience. While inference engines dedicated to decoder-only LLMs were being developed, there was limited progress in optimizing deployments for encoder-only LLMs. The field was notably lacking services that could serve these types of models with minimal friction.

Infinity and Beyond

Inspired by this realization, Feil decided to build Infinity, a fast and efficient serving system for Generative AI \- which received significant initial traction. “I published the project and got a lot of attention from infrastructure and OSS providers. It’s also featured in the top generative AI infrastructure projects by author Chip Huyen. I am particularly proud of seeing the project adopted by public companies like SAP, but it also makes me extremely excited to see open source efforts such as the TrueFoundry’s Cognita that went all in on Infinity.”

Feil continued to build other popular infrastructures for Generative AI, like embed and hf-hub-ctranslate2. Now, Feil’s work has been deployed by companies like Runpod, SAP, and TrueFoundry, and adopted by hundreds of projects, accumulating over 200,000 Docker pulls and 1200 Github stars within the first year.

Enabling distributed GPU training

While at Gradient, Feil continued to push boundaries with a custom version of Ring Attention, a novel technique that allows for efficient training of very long sequences by splitting the matrix multiplication of attention into tiny quadratic blocks over different GPUs. As a result, the memory requirement per GPU is drastically lowered, while the computation can be parallelized across a large batch and attention head size.

This technique has also been referred to as “Sequence Parallelism” and influenced the training architecture of projects such as “[llama-3.1]” by Meta AI. As Feil says, “It was a truly special moment of teamwork with our research team to have access to a newly built supercomputer built by Crusoe and scale our algorithm to over 900 GPUs.”

This model was a huge success, quickly becoming the top-trending model on Hugging Face in May 2024, alongside the llama-3 models by Meta & Apple OpenELM. It received over 500,000 Twitter views and was covered by leading voices and publications in the field. The success of Feil’s work forced the next generation of models to be trained on longer context.

A Bright Future Ahead

Feil remains driven by his goal to advance AI and machine learning and continuously explore new possibilities in the field. He has observed an important shift toward open-source AI, noting, “I am currently seeing a revolution of open-source in AI. Just recently, Meta released LLama-405B, the first open-source model that challenges the superior capabilities of GPT-4 and ChatGPT by OpenAI. Most of the infrastructure for inference and training has become open, and the same goes for evaluation. Regulators push for more openness and transparency.”

However, Feil also points out a critical challenge: “At the same time, open-source and open models have very few options for monetization. I believe that giving open-source AI projects a marketplace for collaborating with companies could potentially become the next Canonical or Cloud Provider.” This perspective underscores his commitment to taking charge of integrating open-source AI with broader commercial opportunities.

In an era where AI and machine learning are rapidly changing the face of technology, Michael Feil stands out for pushing the boundaries of what systems can do. His work with LLMs and contributions to open-source AI have reshaped the field, driving companies like Gradient.AI to the forefront of AI-based solutions. Feil’s journey highlights the impact of individual expertise and passion for continuous learning, offering readers a glimpse into the future of AI and the pivotal role that advanced language models will play in shaping it.

Artificial Intelligence

Recent Post

Contact Us

Michael Feil on Developing Long-Context Language Models

An Extensive Background in ML

ChatGPT as a Springboard to Higher Heights

Infinity and Beyond

Enabling distributed GPU training

A Bright Future Ahead

Addressing the US Labor Shortage: The Role of coAmplifi in Minimizing Burnout and Retaining Talent

How AI is Transforming Call Centers and Customer Service in Unexpected Ways And Why Operative Intelligence is Leading the Charge

You may also like

Leave a Comment Cancel Reply

Recent Post

Contact Us