Spatial interference background notes


Paul Albertella
 

Hi,

Further to discussions in today's call, my colleague Ben gave a talk about some work we did on Execution State at Linux Plumbers last year:

https://linuxplumbersconf.org/event/7/contributions/698/

He also wrote an article about kernel to userspace memory access, which may also be relevant to these discussions:

https://www.codethink.co.uk/articles/2020/investigating-kernel-user-space-access/

Regards,

Paul


Lukas Bulwahn
 

On Tue, Sep 28, 2021 at 3:03 PM Paul Albertella
<paul.albertella@...> wrote:

Hi,

Further to discussions in today's call, my colleague Ben gave a talk
about some work we did on Execution State at Linux Plumbers last year:

https://linuxplumbersconf.org/event/7/contributions/698/

He also wrote an article about kernel to userspace memory access, which
may also be relevant to these discussions:

https://www.codethink.co.uk/articles/2020/investigating-kernel-user-space-access/
Thanks, Paul, for those pointers. I can only recommend everyone to
start understanding those investigations.

I would like to point out another related investigation from the ELISA
workshop in May 2020, Fault Hypothesis and Technical Measures to
Ensure Integrity on a Process Memory Within a Mixed Criticality
Environment by Thomas Brinker. Slides and Recording of that is
available.


Clearly, if the group would like to make any valuable progress on the
requirement "The operating system shall maintain and enforce the
integrity of the process address space along with the process
lifecycle", I can only strongly recommend:

1. Get accustomed with what features provide an application/process a
virtual view on its memory, what happens in software and hardware such
that the process may believe that it has its own dedicated memory.
Understand the investigations mentioned above. Potentially repeat
those investigations on the current kernel to get a good
understanding; the investigations are dated 2018 and 2019 and the
kernel has changed a bit (which may lead to the observation that some
issues mentioned have been resolved or new issues have been
introduced). Understand how these investigations contribute to a
partial consideration of a few specific aspects, and understand which
aspects are open/unaddressed not mentioned in those investigations.

I see this specific task with an actually helpful result for the ELISA
Project, i.e., to Enable Linux in Safety-critical Applications:

2a. The group investigating the requirement will spend a lot of time
and discussion learning which different technical views on the
applications'/processes' memory exist in user-land and in the kernel,
and which operations must preserve or may modify this view on certain
levels. This will require quite some reading of existing literature on
memory management and structuring this information in a systematic
way. If we succeed to structure this information systematically, we
should provide this well-structured documentation to the kernel
documentation. This will ensure that the information is critically
reviewed by others and that the structure and classification generally
reflects the current state of the implementation. Others (or in a next
step, this group) may then use this structure and classification to
argue that the dedicated verification activities address the various
points mentioned in that systematic description of the memory
management views/functionalities. If there is somebody that is willing
to address the work in such a way, I am willing to work together on
this in a suitable working group. It is really first a reading group
of probably just two or three people and summarizing the knowledge in
a systematic structured way (rather than spending half a year of
non-expert brainstorming, continuous random investigations, confusion
due to knowledge gaps, and repetitive utterance of very light
understanding of how memory management in a general-purpose operating
system works). If the majority does not see the value in writing
documentation of their gained understanding through reading, I would
recommend to split off a separate working group, e.g., "Basics of
memory management WG", just for the subgroup that understands and
documents that understanding. This WG could then follow with in-depth
investigations, such as "what is the direct mapping in the kernel used
for" and to which extent and through which changes (for the sake of an
investigation) may a kernel variant exist that does not use direct
mapping. Implementing such a kernel variant validates the WG's
understanding of the "what is the direct mapping in the kernel used
for", even if just for validation of the educational purpose.


2b. If you would like to continue the approach of using tools to
extract information from the source code (without the initial effort
of understanding memory management on an abstract level; which works
but takes extra time to learn many things on the fly and in very
unstructured way), I can only recommend to consider the relevant data
structures to preserve/store/capture the virtual memory of an
application, AND NOT to start your investigation by considering all
interaction points between user-space and the kernel to understand
memory management. The reason is clear: Memory management is much more
related to preserving a certain application state (hence the relevant
data structures are the key) than to the correctness of just a few
limited kernel operations bounded in time, such as one interrupt, a
single syscall call, a call to an ioctl operation (as it happened to
be for the watchdog requirement). I will also happily follow that
exercise in the group; I also assume Daniel and Gab cannot be stopped
from taking everyone else down that direction within the current WG
anyway, but I think it is much better to start with Option 2a (hence
it might deserve a separate WG/study group for those that first want
to understand the basics, rather than speculating about random C and
assembler code pointed out by an incomplete code analysis
"architectural" tool). Anyway, I wish the group actually working on 2b
good luck and endurance on your way.


Lukas


Lukas Bulwahn
 

On Wed, Sep 29, 2021 at 6:49 AM Lukas Bulwahn via lists.elisa.tech
<lukas.bulwahn=gmail.com@...> wrote:

On Tue, Sep 28, 2021 at 3:03 PM Paul Albertella
<paul.albertella@...> wrote:

Hi,

Further to discussions in today's call, my colleague Ben gave a talk
about some work we did on Execution State at Linux Plumbers last year:

https://linuxplumbersconf.org/event/7/contributions/698/

He also wrote an article about kernel to userspace memory access, which
may also be relevant to these discussions:

https://www.codethink.co.uk/articles/2020/investigating-kernel-user-space-access/
Thanks, Paul, for those pointers. I can only recommend everyone to
start understanding those investigations.

I would like to point out another related investigation from the ELISA
workshop in May 2020, Fault Hypothesis and Technical Measures to
Ensure Integrity on a Process Memory Within a Mixed Criticality
Environment by Thomas Brinker. Slides and Recording of that is
available.


Clearly, if the group would like to make any valuable progress on the
requirement "The operating system shall maintain and enforce the
integrity of the process address space along with the process
lifecycle", I can only strongly recommend:

1. Get accustomed with what features provide an application/process a
virtual view on its memory, what happens in software and hardware such
that the process may believe that it has its own dedicated memory.
Understand the investigations mentioned above. Potentially repeat
those investigations on the current kernel to get a good
understanding; the investigations are dated 2018 and 2019 and the
kernel has changed a bit (which may lead to the observation that some
issues mentioned have been resolved or new issues have been
introduced). Understand how these investigations contribute to a
partial consideration of a few specific aspects, and understand which
aspects are open/unaddressed not mentioned in those investigations.

I see this specific task with an actually helpful result for the ELISA
Project, i.e., to Enable Linux in Safety-critical Applications:

2a. The group investigating the requirement will spend a lot of time
and discussion learning which different technical views on the
applications'/processes' memory exist in user-land and in the kernel,
and which operations must preserve or may modify this view on certain
levels. This will require quite some reading of existing literature on
memory management and structuring this information in a systematic
way. If we succeed to structure this information systematically, we
should provide this well-structured documentation to the kernel
documentation. This will ensure that the information is critically
reviewed by others and that the structure and classification generally
reflects the current state of the implementation. Others (or in a next
step, this group) may then use this structure and classification to
argue that the dedicated verification activities address the various
points mentioned in that systematic description of the memory
management views/functionalities. If there is somebody that is willing
to address the work in such a way, I am willing to work together on
this in a suitable working group. It is really first a reading group
of probably just two or three people and summarizing the knowledge in
a systematic structured way (rather than spending half a year of
non-expert brainstorming, continuous random investigations, confusion
due to knowledge gaps, and repetitive utterance of very light
understanding of how memory management in a general-purpose operating
system works). If the majority does not see the value in writing
documentation of their gained understanding through reading, I would
recommend to split off a separate working group, e.g., "Basics of
memory management WG", just for the subgroup that understands and
documents that understanding. This WG could then follow with in-depth
investigations, such as "what is the direct mapping in the kernel used
for" and to which extent and through which changes (for the sake of an
investigation) may a kernel variant exist that does not use direct
mapping. Implementing such a kernel variant validates the WG's
understanding of the "what is the direct mapping in the kernel used
for", even if just for validation of the educational purpose.
I should also mention that I recommend asking developers working on
memory management which verification activities and test suites they
run and why they believe that those activities detect potential bugs
in the memory management functionalities. Again, if we understand
their argumentation, cross-check that argumentation with our own
experiments and document that knowledge in the kernel documentation,
we are a large step forward on "understanding confidently why memory
management in the kernel works", which is the underlying activity for
the safety argumentation of the requirement. This may be an
alternative approach to some architectural analysis of an existing
codebase; probably in a separate WG, though.


Lukas


Gabriele Paoloni
 

Hi Lukas and Paul

Thanks for following up on this point

On 29/09/2021 06:49, Lukas Bulwahn wrote:
On Tue, Sep 28, 2021 at 3:03 PM Paul Albertella
<paul.albertella@...> wrote:

Hi,

Further to discussions in today's call, my colleague Ben gave a talk
about some work we did on Execution State at Linux Plumbers last year:

https://linuxplumbersconf.org/event/7/contributions/698/

He also wrote an article about kernel to userspace memory access, which
may also be relevant to these discussions:

https://www.codethink.co.uk/articles/2020/investigating-kernel-user-space-access/
I have looked into these PPTs and they are very interesting and very
relevant to our discussion topics. I have some comments/questions:
WRT the plumbers deck:

Slide 7/17: what do we mean by arch specific events? aren't they part of
exceptions/interrupts?

slide9/17: this is out of the scope of our investigation (i.e. we assume
a systematic capability of the HW and Random HW faults mitigated by some
safety mechanisms defined in the HW safety manual)

slide10/17:WRT SW faults our first step is to identify the Kernel code that:
- is responsible for creating process address spaces
- is responsible for maintaining process address spaces
- is resposible for configuring any HW that is required to support the
above mentioned activities

Right now we temporarily parked aside any code that is not functionally
involved in the activities above and that BTW could interfere (that
should be covered by the Kernel FFI activity)



Thanks, Paul, for those pointers. I can only recommend everyone to
start understanding those investigations.

I would like to point out another related investigation from the ELISA
workshop in May 2020, Fault Hypothesis and Technical Measures to
Ensure Integrity on a Process Memory Within a Mixed Criticality
Environment by Thomas Brinker. Slides and Recording of that is
available.


Clearly, if the group would like to make any valuable progress on the
requirement "The operating system shall maintain and enforce the
integrity of the process address space along with the process
lifecycle", I can only strongly recommend:

1. Get accustomed with what features provide an application/process a
virtual view on its memory, what happens in software and hardware such
that the process may believe that it has its own dedicated memory.
Understand the investigations mentioned above. Potentially repeat
those investigations on the current kernel to get a good
understanding; the investigations are dated 2018 and 2019 and the
kernel has changed a bit (which may lead to the observation that some
issues mentioned have been resolved or new issues have been
introduced). Understand how these investigations contribute to a
partial consideration of a few specific aspects, and understand which
aspects are open/unaddressed not mentioned in those investigations.
Agreed. We just started trying to scope the entrypoints, that I would
expect to land into different subsystems. So there is a lot to study and
learn and any previous investigation to look at is very helpful here


I see this specific task with an actually helpful result for the ELISA
Project, i.e., to Enable Linux in Safety-critical Applications:

2a. The group investigating the requirement will spend a lot of time
and discussion learning which different technical views on the
applications'/processes' memory exist in user-land and in the kernel,
and which operations must preserve or may modify this view on certain
levels. This will require quite some reading of existing literature on
memory management and structuring this information in a systematic
way. If we succeed to structure this information systematically, we
should provide this well-structured documentation to the kernel
documentation. This will ensure that the information is critically
reviewed by others and that the structure and classification generally
reflects the current state of the implementation. Others (or in a next
step, this group) may then use this structure and classification to
argue that the dedicated verification activities address the various
points mentioned in that systematic description of the memory
management views/functionalities. If there is somebody that is willing
to address the work in such a way, I am willing to work together on
this in a suitable working group. It is really first a reading group
of probably just two or three people and summarizing the knowledge in
a systematic structured way (rather than spending half a year of
non-expert brainstorming, continuous random investigations, confusion
due to knowledge gaps, and repetitive utterance of very light
understanding of how memory management in a general-purpose operating
system works). If the majority does not see the value in writing
documentation of their gained understanding through reading, I would
recommend to split off a separate working group, e.g., "Basics of
memory management WG", just for the subgroup that understands and
documents that understanding. This WG could then follow with in-depth
investigations, such as "what is the direct mapping in the kernel used
for" and to which extent and through which changes (for the sake of an
investigation) may a kernel variant exist that does not use direct
mapping. Implementing such a kernel variant validates the WG's
understanding of the "what is the direct mapping in the kernel used
for", even if just for validation of the educational purpose.
I totally agree on how to structure the investigation and I am looking
forward to work with you Lukas. I would suggest to get started, involve
the right people, then decide if we need another WG, ok?



2b. If you would like to continue the approach of using tools to
extract information from the source code (without the initial effort
of understanding memory management on an abstract level; which works
but takes extra time to learn many things on the fly and in very
unstructured way), I can only recommend to consider the relevant data
structures to preserve/store/capture the virtual memory of an
application, AND NOT to start your investigation by considering all
interaction points between user-space and the kernel to understand
memory management. The reason is clear: Memory management is much more
related to preserving a certain application state (hence the relevant
data structures are the key) than to the correctness of just a few
limited kernel operations bounded in time, such as one interrupt, a
single syscall call, a call to an ioctl operation (as it happened to
be for the watchdog requirement). I will also happily follow that
exercise in the group; I also assume Daniel and Gab cannot be stopped
from taking everyone else down that direction within the current WG
anyway, but I think it is much better to start with Option 2a (hence
it might deserve a separate WG/study group for those that first want
to understand the basics, rather than speculating about random C and
assembler code pointed out by an incomplete code analysis
"architectural" tool). Anyway, I wish the group actually working on 2b
good luck and endurance on your way.
I am afraid we passed the wrong msg here. Tools are helpful but we
cannot rely on these. Even when explaining the hybrid approach we said
that tools can generate a baseline to work on but human review of the
code is ABSOLUTELY required to create a meaningful picture. IMO we
should start according to 2a and then we can use tools to support our
analysis and complete it (using tools we may figure out that we missed
to analyse some aspects for example).
So WDYT?
If we are aligned on starting on 2a we'll follow up with a more formal
sub-task proposal and we can discuss a dedicated weekly time slot to
work on it...

Thanks
Gab



Lukas





Robert Krutsch
 

Thanks a lot Lukas for the material, it was very insightful.

I have been thinking also about this topic a year ago and had a lot of questions and not many answers without a carefully designed HW. On top of that what was discussed by Thomas B. we have a lot of other issues:
- Cache hierarchy influences; not all architectures have the same way of cache indexing and there are many failure modes there 
- Interconnect and DDR controller also contribute to the data-path and bring many challenges
- Timing related issues across caches, interconnect and DDR are hard to mitigate and detection is typically so late that it is not helpful

A careful HW design and maybe some constraints could help out. One proposal that I had back then is to constrain the execution of safety tasks to some of the cores. That would allow us to use a lot of the protection mechanisms in the MMU as we have different masters for safe and unsafe applications. It depends a lot also if the caches are virtually tagged and how the hierarchy is; I have not studied the new CPUs from ARM to see if different clusters would be needed to achieve a satisfactory isolation.

//Robert




On Wed, Sep 29, 2021 at 7:07 AM Lukas Bulwahn <lukas.bulwahn@...> wrote:
On Wed, Sep 29, 2021 at 6:49 AM Lukas Bulwahn via lists.elisa.tech
<lukas.bulwahn=gmail.com@...> wrote:
>
> On Tue, Sep 28, 2021 at 3:03 PM Paul Albertella
> <paul.albertella@...> wrote:
> >
> > Hi,
> >
> > Further to discussions in today's call, my colleague Ben gave a talk
> > about some work we did on Execution State at Linux Plumbers last year:
> >
> > https://linuxplumbersconf.org/event/7/contributions/698/
> >
> > He also wrote an article about kernel to userspace memory access, which
> > may also be relevant to these discussions:
> >
> > https://www.codethink.co.uk/articles/2020/investigating-kernel-user-space-access/
> >
>
> Thanks, Paul, for those pointers. I can only recommend everyone to
> start understanding those investigations.
>
> I would like to point out another related investigation from the ELISA
> workshop in May 2020, Fault Hypothesis and Technical Measures to
> Ensure Integrity on a Process Memory Within a Mixed Criticality
> Environment by Thomas Brinker. Slides and Recording of that is
> available.
>
>
> Clearly, if the group would like to make any valuable progress on the
> requirement "The operating system shall maintain and enforce the
> integrity of the process address space along with the process
> lifecycle", I can only strongly recommend:
>
> 1. Get accustomed with what features provide an application/process a
> virtual view on its memory, what happens in software and hardware such
> that the process may believe that it has its own dedicated memory.
> Understand the investigations mentioned above. Potentially repeat
> those investigations on the current kernel to get a good
> understanding; the investigations are dated 2018 and 2019 and the
> kernel has changed a bit (which may lead to the observation that some
> issues mentioned have been resolved or new issues have been
> introduced). Understand how these investigations contribute to a
> partial consideration of a few specific aspects, and understand which
> aspects are open/unaddressed not mentioned in those investigations.
>
> I see this specific task with an actually helpful result for the ELISA
> Project, i.e., to Enable Linux in Safety-critical Applications:
>
> 2a. The group investigating the requirement will spend a lot of time
> and discussion learning which different technical views on the
> applications'/processes' memory exist in user-land and in the kernel,
> and which operations must preserve or may modify this view on certain
> levels. This will require quite some reading of existing literature on
> memory management and structuring this information in a systematic
> way. If we succeed to structure this information systematically, we
> should provide this well-structured documentation to the kernel
> documentation. This will ensure that the information is critically
> reviewed by others and that the structure and classification generally
> reflects the current state of the implementation. Others (or in a next
> step, this group) may then use this structure and classification to
> argue that the dedicated verification activities address the various
> points mentioned in that systematic description of the memory
> management views/functionalities. If there is somebody that is willing
> to address the work in such a way, I am willing to work together on
> this in a suitable working group. It is really first a reading group
> of probably just two or three people and summarizing the knowledge in
> a systematic structured way (rather than spending half a year of
> non-expert brainstorming, continuous random investigations, confusion
> due to knowledge gaps, and repetitive utterance of very light
> understanding of how memory management in a general-purpose operating
> system works). If the majority does not see the value in writing
> documentation of their gained understanding through reading, I would
> recommend to split off a separate working group, e.g., "Basics of
> memory management WG", just for the subgroup that understands and
> documents that understanding. This WG could then follow with in-depth
> investigations, such as "what is the direct mapping in the kernel used
> for" and to which extent and through which changes (for the sake of an
> investigation) may a kernel variant exist that does not use direct
> mapping. Implementing such a kernel variant validates the WG's
> understanding of the "what is the direct mapping in the kernel used
> for", even if just for validation of the educational purpose.
>

I should also mention that I recommend asking developers working on
memory management which verification activities and test suites they
run and why they believe that those activities detect potential bugs
in the memory management functionalities. Again, if we understand
their argumentation, cross-check that argumentation with our own
experiments and document that knowledge in the kernel documentation,
we are a large step forward on "understanding confidently why memory
management in the kernel works", which is the underlying activity for
the safety argumentation of the requirement. This may be an
alternative approach to some architectural analysis of an existing
codebase; probably in a separate WG, though.


Lukas