Data Source
compose exec -T db-dev psql --csv <<_EOF > riders.csv
select
riders.dob,
riders.short,
riders.name,
riders.country_long,
competitions.cat,
teams.cat as uci_cat,
competitions.year,
riders_seasons.pcs_total,
riders_seasons.pcs_rank
from riders
join riders_seasons on riders_seasons.rider_short=riders.short
join competitions on competitions.id=riders_seasons.competition_id
join teams on riders_seasons.team_id=teams.id
where competitions.year >= 2017 and competitions.year <= 2025
order by competitions.year, riders.short
| year | cat | uci_cat | n |
|---|---|---|---|
| 2020 | MEN | PRT | 320 |
| 2020 | MEN | WT | 536 |
| 2020 | WOMEN | WTW | 58 |
| 2021 | MEN | PRT | 367 |
| 2021 | MEN | WT | 559 |
| 2021 | WOMEN | WTW | 77 |
| 2022 | MEN | PRT | 347 |
| 2022 | MEN | WT | 552 |
| 2022 | WOMEN | WTW | 115 |
| 2023 | MEN | PRT | 383 |
| 2023 | MEN | WT | 534 |
| 2023 | WOMEN | WTW | 176 |
| 2024 | MEN | PRT | 374 |
| 2024 | MEN | WT | 535 |
| 2024 | WOMEN | WTW | 213 |
| 2025 | MEN | PRT | 379 |
| 2025 | MEN | WT | 527 |
| 2025 | WOMEN | PRW | 100 |
| 2025 | WOMEN | WTW | 248 |
more or less as expected: - many with 0 or few points. - a little bit fatter in the middle - long tail -> few with a lot of points
convert this to costs: - scale 1 - 61 (most expensive, up for discussion, initial budget 360 for 25 riders)
| year | max_pcs | min_pcs | mean_pcs | median_pcs | q25 | q75 | q90 |
|---|---|---|---|---|---|---|---|
| 2020 | 2431 | 1 | 151.35 | 58.5 | 25 | 160.75 | 406.6 |
| 2021 | 3328 | 1 | 214.79 | 92.0 | 32 | 255.00 | 560.6 |
| 2022 | 3413 | 1 | 232.83 | 101.0 | 40 | 286.75 | 577.8 |
| 2023 | 3602 | 1 | 251.43 | 118.0 | 50 | 298.00 | 648.4 |
| 2024 | 4588 | 1 | 253.84 | 122.5 | 44 | 321.00 | 647.9 |
| 2025 | 4021 | 1 | 214.58 | 99.5 | 39 | 262.75 | 543.1 |
interesting: women seem so have a wider dist, fewer dominant riders?
completely misinterpreted the pogacar effect. actually no impact on density (assumed the longer right tail would make more difference)
How can people allocate a fixed budget on a team that has to be exactly n riders
boring strategies:
better:
in a way you will need to find riders that no one else has, otherwise you will not get ahead (everyone gets the same points) but if you don’t have the superstars other people have you will get left behind.
So: How do I discourage superstar collections?
The idea would be to have different “buckets”
price ranges, assuming max price of 60:
counts: - A: 1% -> 5m, 2-3f - B: 5% -> 25m, 10-15f - C: 20% -> 100m, 50f - D: 25% -> 120m, 60f - E: 50% -> 250m, 120f
Prices + budget should be balanced so that you will need to find good B+C and also look in D+E to fill up
A maximally interesting team would look like this?
I guess this is what the PdC people did:
mayb a pricing/budget balance could be found that does not require (semi)subjective rules, and decisions can blame “the algorithm”
linear see some previous attempts
Gompertz plots
Explore Parameters
Tables
Hist
We can use linear programming to maximise points by
Does not really work — too many for optimal solution.
## Error: no feasible solution found
| short | cat | year | price | pcs_rank | pcs_total | normalized_points | rider_class |
|---|
## # A tibble: 1 × 2
## cost total
## <dbl> <dbl>
## 1 0 0
Let’s transform the calculated prices to fit the range of pdc prices (which were priced from 1-36 (Pogacar is 46, special price, see!))
Wielerfox is 1-60, with Pogacar at 122, an even specialer price. It seems like we have about double price, so let’s scale down to half, while maintaining a minimum of 1. This shifts the lower end a little bit to the left.
write csv for import with: wielerfox prices import