Nahid Akbar

Minimum Viable Logging

In 2025, what is the minimum set of logging features can you expect from a modern application / service?

Context: I recently jumped in to investigate a production issue. One of the page load times was unusually high. All we managed to find was some logs in the API gateway saying a few graphql requests have been timing out. GraphQL requests, as you know, all look like POST /graphql and all the details of the request are in the body of the request - which was not logged in this case. Time bound search was also ineffective. There were dozens of services with thousands of log items. Most of those did not have basic logging required to debug this issue. We ended up fixing the issue by restarting all the pods of the services this particular page was related to. This made me ask, what is the minimum set of logging features can you expect from a modern application / service?

My top must haves are:

Logging should be structured

Logs should be structured, ideally in JSON format. This allows for easier parsing and querying of log data.

How much details are you going to put in your console log or your printf string? You’ll need far more detail in your logs to make them useful when things go wrong.

Logs should identify when, what, who, and outcome

At the very minimum, you should log:

I also prefer to log request and response separately. Helps with debugging dynamic things. E.g. this customer requested this and then this second customer did that and that's why there was a race condition.

Logs should be tied to a request

Each log entry should be associated with a specific request. This can be achieved by including a unique request identifier (e.g. request ID, correlation ID) in all log entries related to that request.

When a request or event is received by a service, one of the very first things it should do is to identify if an upstream request ID is present. It usually comes in the form of a header, e.g. X-Request-ID or X-Correlation-ID. If it is not present, the service should generate one.

This request ID should be:

This allows for easier tracking and debugging of issues that span multiple services or components. If you have log aggregation, this will let you lay bare everything that happened for a request simply by searching for the request ID.

I've also seen applications give out this ID to the customer in error messages. Customers send error message screenshots with their support requests, which lets you quickly find the logs.

If you are using a centralised error tracking system like Sentry or whatever, you can also use this request ID to link the logs to the error reports. If they return any unique IDs when you report an error, you could also include it in your log for extra traceability.

Logs should identify where logging is happening from

This does not need to go all the way down to the file and line number. But every log item should identify the service and the module / component that is logging the message. If you see an error log, you can more quickly identify part of the system that is having issues.

Logs should be complete

I know logging is expensive but hear me out. Say I do the bare minimum of logging. It's only got request details and response status code. It's better than nothing but how useful is that? On the other hand, if the request had 5 main parts, and the logs told me that part 1 - 3 worked, now I have a much better idea of what went wrong.

I am not saying you have to log every small thing. Maybe just the main things:

We had a shitty vendor once where their API would slow down and fail randomly. When we asked them to take a look, they plainly denied there was any problem going on with their service. We also didn't have any receipts in our logs. This was much easier to negotiate after we started logging request and response details and was able to clearly show them how they are failing their SLAs.

If you have good logs, you can even expose them to the customer. For a critical internal accounting application, we kept the running logs in the operational database and streamed that to the user. Instead of relying on developers to debug why things would fail, they could look at the logs and see, "ah this process failed because this customer does not have a credit card etc." They would fix the errors themselves and retry - saving hundreds of developer hours a year.

Written July 2025
© Nahid Akbar