“Where’s the Beef”, Codestral’s Fill-In-the-Middle Magic
Fill-in-the-Middle (FIM) is the ability of an LLM to generate the middle tokens sandwiched between (supplied) prefix and suffix tokens. To be clear, not all LLMs are trained for this capability, in case they are, they can be useful for speeding up software development.
Here’s as example: (1) you know the function prototype (the parameters it takes in), (2) you can provide the docstring, and (3) you know how to call the function. Once you provide all that to the LLM, you want the LLM to fill-in the code in the middle (handy for me, I’m a lazy coder, but an avid reviewer).
For a good chuckle, check out the memorable “Where’s the Beef” commercial. Can you relate?
Objective
To the LLM, we provide lines 1 , 2 as prefix and lines 8, 9 as suffix and have it generate the code in the middle.
How to FIM with codestral
Here’s working code; provide the code prefix as the prompt (the function definition and the docstring), the suffix (how to invoke it) and request a completion.
The print statement above asked for the prefix + in-the-middle-response + suffix (in that order); so, here’s what it prints.
Sure enough, the (generated) code produces the correct output below.
How to train a model for fill-in-the-middle to work?
The “one weird trick” is to simply move a span of text (referred to here as document) from the middle of a document to its end.
As to the FIM training format, the paper says it best:
“We then encode each of the three sections separately and prepend sentinel tokens to the beginning of each section. We denote these sentinel tokens by <PRE>, <MID>, and <SUF>. Finally we concatenate all these sections in the order prefix, suffix, and middle along with their sentinel tokens to form the tokenized version of the FIM document.”
The training sequence include a mix of auto-regressive training and FIM training. The FIM training sequence has a format shown below.
That was the training regime.
For inference, we encode the given prefix and suffix and prompt the model with the sentinel demarcated sequence shown below. Then we continue sampling from the model (in a auto-regressive manner) until it generates the <EOT> token which is how the model communicates it has connected the prefix and the suffix (in my opinion, its tricky).
That’s it!
References
Efficient Training of Language Models to Fill in the Middle
What is FIM and why does it matter in LLM-based AI
P.S. Recently, I’ve discovered that CodeGemma and Granite Code Models also provide the fill-in-the-middle capability. Since the training regime is simple, it makes sense that all code models support it. The more the merrier!