Minimum Viable Logging

In 2025, what is the minimum set of logging features can you expect from a modern application / service?

Context: I recently jumped in to investigate a production issue. One of the page load times was unusually high. All we managed to find was some logs in the API gateway saying a few graphql requests have been timing out. GraphQL requests, as you know, all look like POST /graphql and all the details of the request are in the body of the request - which was not logged in this case. Time bound search was also ineffective. There were dozens of services with thousands of log items. Most of those did not have basic logging required to debug this issue. We ended up fixing the issue by restarting all the pods of the services this particular page was related to. This made me ask, what is the minimum set of logging features can you expect from a modern application / service?

My top must haves are:

Logging should be structured

Logs should be structured, ideally in JSON format. This allows for easier parsing and querying of log data.

How much details are you going to put in your console log or your printf string? You’ll need far more detail in your logs to make them useful when things go wrong.

Logs should identify when, what, who, and outcome

At the very minimum, you should log:

Timestamp - when something happened. Use a consistent timestamp format (e.g. Unix timestamp or ISO‑8601 with millisecond precision) to make ordering and searching easier.
Request information.
- For example url, method, body etc.
- Critical headers. For example, if it's a SOAP request, the SOAP action header is a must.
- Body of the request, if applicable. In the above example, you could easily log the GraphQL query or mutation name. Be aware that you are not logging private/sensitive information like emails, passwords and credit card numbers. They need to be redacted / masked.
Who this request relates to e.g. if there is an user id, session id etc. It's very common to be asked to dig up user activity when they are deemed suspicious or when they report an issue.
What is the outcome of the request
- For example response status code
- It should include any error code or message
- Basic request metrics such as duration of request

I also prefer to log request and response separately. Helps with debugging dynamic things. E.g. this customer requested this and then this second customer did that and that's why there was a race condition.

Logs should be tied to a request

Each log entry should be associated with a specific request. This can be achieved by including a unique request identifier (e.g. request ID, correlation ID) in all log entries related to that request.

When a request or event is received by a service, one of the very first things it should do is to identify if an upstream request ID is present. It usually comes in the form of a header, e.g. X-Request-ID or X-Correlation-ID. If it is not present, the service should generate one.

This request ID should be:

included in all logs related to that request
should be passed to downstream services, so that they can also log it
should be included in the response headers, so that the client can also log it

This allows for easier tracking and debugging of issues that span multiple services or components. If you have log aggregation, this will let you lay bare everything that happened for a request simply by searching for the request ID.

I've also seen applications give out this ID to the customer in error messages. Customers send error message screenshots with their support requests, which lets you quickly find the logs.

If you are using a centralised error tracking system like Sentry or whatever, you can also use this request ID to link the logs to the error reports. If they return any unique IDs when you report an error, you could also include it in your log for extra traceability.

Logs should identify where logging is happening from

This does not need to go all the way down to the file and line number. But every log item should identify the service and the module / component that is logging the message. If you see an error log, you can more quickly identify part of the system that is having issues.

Logs should be complete

I know logging is expensive but hear me out. Say I do the bare minimum of logging. It's only got request details and response status code. It's better than nothing but how useful is that? On the other hand, if the request had 5 main parts, and the logs told me that part 1 - 3 worked, now I have a much better idea of what went wrong.

I am not saying you have to log every small thing. Maybe just the main things:

top level actions - e.g. for an e‑commerce service it could be: created order, reserved stock, sent email etc.
if something leaves the realm of the company that won't be accessible, e.g. request to payment vendor, it should be logged with the request ID and the outcome as well.

We had a shitty vendor once where their API would slow down and fail randomly. When we asked them to take a look, they plainly denied there was any problem going on with their service. We also didn't have any receipts in our logs. This was much easier to negotiate after we started logging request and response details and was able to clearly show them how they are failing their SLAs.

If you have good logs, you can even expose them to the customer. For a critical internal accounting application, we kept the running logs in the operational database and streamed that to the user. Instead of relying on developers to debug why things would fail, they could look at the logs and see, "ah this process failed because this customer does not have a credit card etc." They would fix the errors themselves and retry - saving hundreds of developer hours a year.

⬅️ What does it mean to make a good decision?