
Sharing is caring. Join us on Slack and follow on Twitter.
For dynamic languages, there are well known approaches to achieve maximum performance, e.g. JIT. But such approaches are often very expensive to implement and maintain. In this talk we'll look at ways of implementing interpreters for less frequently used languages, which necessarily trade off performance vs implementation effort. As an example, we will consider the Starlark configuration language, a deterministic subset of Python used by the build systems Buck and Bazel. We'll cover techniques such direct AST interpretation, bytecode, closure generation - comparing the effort and performance of each approach.
Neil Mitchell
JIT compiling VMs are the most effective means of optimising many languages and often achieve very impressive levels of performance. They work by examining a running program and dynamically compiling it to machine code in a period known as "warm-up". In this talk I will examine this concept in detail and present details from the most extensive multi-VM experiment on warm-up to date - which shows that VMs frequently fail to reliably warm-up. I will end with some thoughts about how we might have ended up in this situation and what we might be able to do better in the future.
Laurence Tratt
In 2020, I helped several teams analyze performance and architectural issues in their distributed applications.
In this session, I present a handful of the common patterns we found - such as N+1 Call & Query, Too Granular or Tight Coupling, Bad Timeouts/Retries/Backoff, and Inefficient Dependencies.
I then show you how to derive SLIs & SLOs out of these patterns and have them automatically detected using the CNCF open source project Keptn as part of your CI/CD process.
Andreas Grabner
During the past several years containers inside k8s became default infrastructure abstraction layer hiding a lot of operation complexity as well as application related issues - restarts, slow endpoints, even serious bugs. Unfortunately there is no silver bullet and sooner or later pods start to crash because of OOM killer. Or all of a sudden at peak hours latency goes wild and service can’t handle health-check probe. Or something just doesn’t want to scale. Nowadays there are a lot of tools on a market aimed to help with monitoring, traceability and observability. But nothing helps in root cause analysis as much as profiler does, providing a huge amount of important information that can help to pinpoint an issue up to a certain line of code.
Goal of this talk is to show how JFR can help with debugging different types of performance issues (CPU, OOM, I/O) in production environment
Alexander Kachur
I will present nanoBench (https://github.com/andreas-abel/nanoBench), which is a tool for evaluating small microbenchmarks using hardware performance counters on Intel and AMD x86 systems.
Unlike previous tools, nanoBench can execute microbenchmarks directly in kernel space. This makes it possible to benchmark privileged instructions, and it enables more accurate measurements. The reading of the performance counters is implemented with minimal overhead, avoiding functions calls and branches.
I will illustrate the utility of nanoBench at the hand of two case studies. First, I will discuss how nanoBench has been used to obtain the latency, throughput, and port usage data of more than 13,000 instruction variants that is available at uops.info. Second, I will show how to generate microbenchmarks to characterize the cache architectures of recent Intel microarchitectures.
Andreas Abel
Usage of Performance Counters for Linux (perf_events kernel API) can impose considerable risk of leaking sensitive data accessed by monitored processes. The risk depends on the source of data that perf_events kernel API and Linux Perf tool suite collect and expose for analysis. The data captured from execution context registers, architectural machine specific registers and process memory can potentially be sensitive, so if the API captures content of that registers or memory in some monitoring modes then access to that modes should be secured and ordered firmly. In the existing implementation perf_events kernel API makes use of the combination of access control features: sysctl settings (perf_event_paranoid), DAC (Linux capabilities) and LSM based mechanisms for MAC (SELinux). In this talk Alexei will outline the existing perf_events access control model and discuss typical system configurations that scale from monitoring of own user space activity for everyone to system wide monitoring of the whole machine for a group of dedicated users.
Alexei Budankov
I will present how to build real-time tooling and incremental compilers by structuring them around a build system library called Rock (https://github.com/ollef/rock). Rock is a minimal library that I've developed with inspiration from Shake (https://github.com/ndmitchell/shake) and the recent Build systems à la carte (https://www.microsoft.com/en-us/research/publication/build-systems-la-carte/) paper. I will both show how such a compiler can be structured and how the build system library itself is implemented.
Olle Fredriksson
Java is ubiquitous in online services, yet ensuring Java applications’ availability and performance remains a challenging task. In this talk, we show how established industry approaches and widely-accepted beliefs about Java tuning are wrong and how AI breaks through long-standing limitations.
Stefano Doni
A Performance Sizing Guide for Microservices: Understanding and Measuring CPU throttling in containerized environments
We will introduce what is throttling and show how popular open source and enterprise monitoring solutions address this problem. We will also try to identify new metrics to better understand when throttling is happening. Finally, by the use of benchmark Java applications, we will provide guidelines to effectively understand this important phenomenon and its impact on end users.
Francesco Fabbrizio
We live in an increasingly data-centric world, where we generate enormous amounts of data each day. The growth of information exchange fuels the need for both faster software and faster hardware. Unfortunately, modern CPUs are not enjoying big improvements in single-core performance as they used to in the past decades. Software programmers have had an “easy ride” for decades, thanks to Moore’s law. The free lunch is no longer available.
According to the popular paper "There’s plenty of room at the top" by Leiserson et al., SW tuning will be one of the key drivers for performance gains in the near future. Performance tuning is becoming more important than it has been for the last 40 years. SW developers must become good at optimizing the code of their applications.
Optimization work doesn't finish with choosing the theoretical best-known algorithms and data-structures for your problem and eliminating unnecessary work. In this talk, I will give an introduction to how a casual developer can go about tuning the code of their application that runs on a modern CPU. I will also share my vision of the trends and challenges in the performance world.
Denis Bakhvalov