A glossary of some of the stats that I use during the writing of the by the numbers posts. If I am missing anything let me know and I will add it.
xG (Expected Goals)
I am not the biggest fan of this name, I preferred “Chance Quality” but it is too late to go back and change it. The best way to think about xG is that it is the probability of a shot turning into a goal, ranging from 0 (0% chance of a goal scoring) to 1 (A goal is always scored). In my model, generally the lowest numbers are for really long shots with a bunch of players between the shot and the goal and those will rate at around 0.02 (2%) with the best chances, from say a few feet out and an open goal rating at around 0.97 (97%).
The model is built upon data from the big five leagues of Europe and takes into account as much data as I can get my hands on. The biggest factor is the distance to goal (obviously) and the angle that the player has to goal (more central is better), was it taken with the head? feet? or other? (headers are hard). There is also information on how the shot was assisted, was it a cross (in general this makes a chance harder), a cut back (this is a positive type of cross), through ball (these are really good), pre-assist through ball (sometimes the pass before the key pass broke apart the defensive line and that is important information to know), shot from a rebound (in general this is against a scrambling defense and a positive factor), did it come from a corner or any other set play and was it assisted at all.
I also get data (the Thursday’s after matches from the weekend so this won’t be typically in the by the numbers) from StrataBet that includes measures for the defensive pressure on the shot, the number of defenders (and attackers) between the shot and the goal, a rating of the shot quality (how well it was placed) and a rating of the chance.
The above video is the best chances from week 1 of the Premier League with descriptions of the biggest factors that go into determining the xG rating for each chance to illustrate how the rating is derived.
Shot on Target xG (SoT xG)
This is very similar to the regular xG model except that when the values for the different variables are run through the regression model only shots on target are considered. All shots that are blocked or miss the target are assigned an xG value of zero. With this, shots that end up on target will have a higher value than the corresponding regular xG model.
Over the course of the season these two will be very similar but in a single match, the number of chances on target varies quite a bit and can be a better descriptor of what actually happened. I also use this model when I am looking at goal keepers and how well they save shots, for example David De Gea for Manchester United last season.
Post Match Simulation
After each match, I run a Monte Carlo simulation of the match 10,000 times. This simulation uses the shots that each team produced during the match to give the probability of the outcome, goals scored, goal difference and expected points.
This is an estimation of how this match would typically play out and what the odds are that a certain result would have occurred.
There are major caveats that come with this: one each shot is not independent of each other like the model assumes. If a shot is saved it can produce a rebound, corner, clearance that produces another shot or even a fast break the other way, but if it scores none of these can happen. This is assumed away by this model for simplicity. The second major caveat is that the model also does not take into account game state effects (this will get its own section) which tl;dr basically mean if a team is ahead they don’t attack as much. I still find value in these but I don’t draw massive conclusions from them as the margin of error is quite large.
If you read above you got the short version, this is the longer version. Matches are effected quite a lot by the score (and also the expectations of the two teams before the match even starts) and will effect how the match is played and the stats that follow.
When a team is winning, they are less likely to go all out attacking. You will see the number of shots go down, but the average quality of the chances the do take going up. The reverse is also true for teams down a goal or more, they will commit more players to attack, taking more shots, generally from longer distance (usually though a packed box) and rely more on crosses to create offense (scoring headed shots from crosses is really hard).
You also have effects based on what the team expected before the match. If you get a small team vs a big team, and they are drawing (or even only losing by one goal to preserve that goal differential) late in the match they might sit back and defend deeper than a normal team in the same situation who would think they have a chance to win the match.
Actually quantifying these things is hard and when I present stats I will usual caveat things instead of trying to make some sort of adjustment that would likely introduce more noise into the analysis. Over the course of the season these will likely balance out but will definitely create outliers in small samples.
Kenny Loggins and Top Gun are awesome but this stat refers to the area in the center of the 18 yard box plus the six yard box. The majority of goals come from this section. (Sorry if the song is stuck in your head now)
Offense Value Added (OVA)
The short answer is that it is a stat that looks at all of the offensive actions that a player contributes to and attempts to quantify how much value added they had for the team. This stat is scaled to goals, so a player contributing 1 OVA would be worth about a 1 goal to the offense.
The stat is derived by going through four years of data, looking at when an action happens at a certain coordinate on the pitch what is the likelihood that possession later turned into a goal. So when a player starts a possession the value of where they started is recorded and then compared to the value where the possession ended. The difference between these two is the offense value added.
This is a rough calculation for the different values on the pitch.
Offense value added also takes into account negative actions, such as losing the ball, or missing a pass. For these there is a negative effect plus the value for where the opponent takes over. There are also bonuses for creating big chances, completing through balls, dribbles, and being the one that is on the end of a big chance.
The main idea is that this is a good way to try to account for how much each player contributes to the offense. Essentially this my attempt at a next level xG Chain (look below for that one). I use this for individual players and will also use this to look at team level results as a kind of non-shot xG model. An example of the Belgium vs France Semi Final Running OVA from the World Cup:
It isn’t perfect but I think it is a step in the right direction to assigning value to actions on the pitch. This is also a decent example of game state effects, France dominated the match but after going a goal up, they were more content to keep their defensive shape over risking scoring a second goal.
xG Chain and xG Buildup
xG Chain and xG Buildup take the work done on xG and merge it with passing chains. What his stat does is assign the xG value to everyone who was involved in the possession that lead to the shot.
The 2nd goal in full. Really enjoyed Mhkitaryan’s movement in this, moving from left to right of Jorginho to completely overwhelm him. pic.twitter.com/y2XBMCp0fb
— Arsenal Column (@ArsenalColumn) August 19, 2018
For example in this move against Chelsea: Sokratis, Cech, Mustafi, Monreal, Xhaka, Iwobi, Özil, Guendouzi, Bellerin, and Mkhitaryan all get xG Chain credit of 0.4 for this move. (One of the draw backs of this is that everyone gets the same credit even if they don’t add the same value, see above for my alternative)
xG Buildup has the same exact structure but excludes the shooter and the player playing the key pass (unless if they were involved earlier in the move). So for this example Sokratis, Cech, Mustafi, Monreal, Xhaka, Iwobi, Özil, Guendouzi, and Bellerin would get credit. Iwobi DOES get credit because he made a contribution to the buildup to the shot. There is also no double counting in these, doing one pass is the same as making three.
Conversion rates, Goal Efficiency
One of the things that I am interested in is looking at conversion rates and overall goal scoring efficiency. I think looking at conversion rates for shots is a great way to help determine if a player is going through a fluky hot streak, an unlucky cold or is just really good or bad.
The first one that I like to look at is on target accuracy, total shots on target/total shots. In general about 33% to 35% of shots end up on target, but varies depending on where the shots are coming from.
From this you can see interesting things, as of this writing (23 August, 2018) Everton are putting 60% of their shots on target. That goes a long was to understanding why they have 4 goals from 20 shots. The flip side would be Cardiff City who have 22 shots but are putting just 9% on target but have scored 0 goals. Both teams have similar xG for the season (2.9 vs 2.2) but have a big difference in goals scored (4 vs 0).
Next I like to look at conversion rate, which is goals scored/shots on target. This is another similar stat but looks at ability to turn shots on target into goals. Typically between 30% and 35% of shots on target are converted into goals.
From this you can see interesting things, as of this writing (23 August, 2018) Brighton are converting 100% of their shots on target into goals (2 non-pen shots on target, 2 goals) while West Ham have a 0% conversion rate (6 non-pen shots on target, 0 goals).
Going one step further I like to take both of these metrics and apply them to Big Chances. Big chances are put on target about 55% of the time, converted at about a 40% rate with an on target conversion rate of about 60%.
Arsenal so far (23 August, 2018) have created 5 big chances, put just 20% on target, converted just 20% and scored the only one they put on target. With so few big scoring chances in a match putting these in the back of the net are very important.
Lastly are the efficiency stats, these take goals scored/xG (or SoT xG) and provide a quick way of determining how well a team or player has done in converting chances into goals.
Passes Per Defensive Action (PPDA)
This is the current best way to define how much and where a team presses (at least with those who only have access to on ball event data). It looks at the number of defensive actions: tackles, interceptions, blocked passes, clearances and fouls a team has per pass the opponent attempts. So a team that has a fewer passes allowed per defensive action would be considered a higher pressing team. This can be broken up into different areas of the pitch to get an idea of where the defensive pressure happens.
Have more questions? Let me know in the comments and I will expand this as needed.