Re: What to do in response to a kernel warning


Lukas Bulwahn
 

On Fri, Nov 19, 2021 at 5:58 PM Shuah Khan <skhan@...> wrote:

All,

This is an active thread about "What to do in response to a kernel
warning" on Linux kernel mailing lists.

Lukas and others from ELISA have been participating. Give it a read.
Alexander Popov called out ELISA for input and feedback on his take
on solving the big hammer approach of sysctl knob (kernel/panic_on_warn
knob with proposing adding kernel/pkill_on_warn knob to kill threads
and process that cause the warn as opposed taking the system down.

Give it a read - if you can't access it now, it will available without
subscription in a week.

https://lwn.net/Articles/876209/
Thanks, Shuah for pointing out the LWN article.

Alex Popov pulled us into a kernel discussion this week on a specific
kernel feature proposal with a remark that that is what
safety-critical systems need.

In short, Alexander Popov suggested that warnings in the kernel need a
refined run-time treatment. I disagreed with him and stated that I see
that panic_on_warn would be turned on in the kernel for
safety-critical systems and that a safety-critical system never would
try to continue to operate after a warn(): the risk of malfunction is
larger than the benefit of continued operation.

All of this is of course largely a hypothesis based on my
understanding of the requirements of safety-critical systems that may
ever rely on Linux.

I would of course be interested in:

- do we all agree that setting panic_on_warn is the reasonable choice
for this kernel configuration for the safety-critical systems we are
discussing? Are there arguments not to set panic_on_warn that I am not
aware of or I misjudged?

- Which warnings and kernel panics do you encounter in your current
test and (early) production systems when switching on panic_on_warn?
We can support each other here to debug and resolve them
appropriately.

Please share such information. I am confident that ELISA contributors
could support your development (clean-up) activities if that
information on known encountered but unresolved warnings is shared.


Lukas

Join {devel@lists.elisa.tech to automatically receive all group messages.