Date
1 - 5 of 5
Spatial interference background notes
Hi,
Further to discussions in today's call, my colleague Ben gave a talk about some work we did on Execution State at Linux Plumbers last year: https://linuxplumbersconf.org/event/7/contributions/698/ He also wrote an article about kernel to userspace memory access, which may also be relevant to these discussions: https://www.codethink.co.uk/articles/2020/investigating-kernel-user-space-access/ Regards, Paul |
|
On Tue, Sep 28, 2021 at 3:03 PM Paul Albertella
<paul.albertella@...> wrote: Thanks, Paul, for those pointers. I can only recommend everyone to start understanding those investigations. I would like to point out another related investigation from the ELISA workshop in May 2020, Fault Hypothesis and Technical Measures to Ensure Integrity on a Process Memory Within a Mixed Criticality Environment by Thomas Brinker. Slides and Recording of that is available. Clearly, if the group would like to make any valuable progress on the requirement "The operating system shall maintain and enforce the integrity of the process address space along with the process lifecycle", I can only strongly recommend: 1. Get accustomed with what features provide an application/process a virtual view on its memory, what happens in software and hardware such that the process may believe that it has its own dedicated memory. Understand the investigations mentioned above. Potentially repeat those investigations on the current kernel to get a good understanding; the investigations are dated 2018 and 2019 and the kernel has changed a bit (which may lead to the observation that some issues mentioned have been resolved or new issues have been introduced). Understand how these investigations contribute to a partial consideration of a few specific aspects, and understand which aspects are open/unaddressed not mentioned in those investigations. I see this specific task with an actually helpful result for the ELISA Project, i.e., to Enable Linux in Safety-critical Applications: 2a. The group investigating the requirement will spend a lot of time and discussion learning which different technical views on the applications'/processes' memory exist in user-land and in the kernel, and which operations must preserve or may modify this view on certain levels. This will require quite some reading of existing literature on memory management and structuring this information in a systematic way. If we succeed to structure this information systematically, we should provide this well-structured documentation to the kernel documentation. This will ensure that the information is critically reviewed by others and that the structure and classification generally reflects the current state of the implementation. Others (or in a next step, this group) may then use this structure and classification to argue that the dedicated verification activities address the various points mentioned in that systematic description of the memory management views/functionalities. If there is somebody that is willing to address the work in such a way, I am willing to work together on this in a suitable working group. It is really first a reading group of probably just two or three people and summarizing the knowledge in a systematic structured way (rather than spending half a year of non-expert brainstorming, continuous random investigations, confusion due to knowledge gaps, and repetitive utterance of very light understanding of how memory management in a general-purpose operating system works). If the majority does not see the value in writing documentation of their gained understanding through reading, I would recommend to split off a separate working group, e.g., "Basics of memory management WG", just for the subgroup that understands and documents that understanding. This WG could then follow with in-depth investigations, such as "what is the direct mapping in the kernel used for" and to which extent and through which changes (for the sake of an investigation) may a kernel variant exist that does not use direct mapping. Implementing such a kernel variant validates the WG's understanding of the "what is the direct mapping in the kernel used for", even if just for validation of the educational purpose. 2b. If you would like to continue the approach of using tools to extract information from the source code (without the initial effort of understanding memory management on an abstract level; which works but takes extra time to learn many things on the fly and in very unstructured way), I can only recommend to consider the relevant data structures to preserve/store/capture the virtual memory of an application, AND NOT to start your investigation by considering all interaction points between user-space and the kernel to understand memory management. The reason is clear: Memory management is much more related to preserving a certain application state (hence the relevant data structures are the key) than to the correctness of just a few limited kernel operations bounded in time, such as one interrupt, a single syscall call, a call to an ioctl operation (as it happened to be for the watchdog requirement). I will also happily follow that exercise in the group; I also assume Daniel and Gab cannot be stopped from taking everyone else down that direction within the current WG anyway, but I think it is much better to start with Option 2a (hence it might deserve a separate WG/study group for those that first want to understand the basics, rather than speculating about random C and assembler code pointed out by an incomplete code analysis "architectural" tool). Anyway, I wish the group actually working on 2b good luck and endurance on your way. Lukas |
|
On Wed, Sep 29, 2021 at 6:49 AM Lukas Bulwahn via lists.elisa.tech
<lukas.bulwahn=gmail.com@...> wrote: I should also mention that I recommend asking developers working on memory management which verification activities and test suites they run and why they believe that those activities detect potential bugs in the memory management functionalities. Again, if we understand their argumentation, cross-check that argumentation with our own experiments and document that knowledge in the kernel documentation, we are a large step forward on "understanding confidently why memory management in the kernel works", which is the underlying activity for the safety argumentation of the requirement. This may be an alternative approach to some architectural analysis of an existing codebase; probably in a separate WG, though. Lukas |
|
Hi Lukas and Paul
Thanks for following up on this point On 29/09/2021 06:49, Lukas Bulwahn wrote: On Tue, Sep 28, 2021 at 3:03 PM Paul AlbertellaI have looked into these PPTs and they are very interesting and very relevant to our discussion topics. I have some comments/questions: WRT the plumbers deck: Slide 7/17: what do we mean by arch specific events? aren't they part of exceptions/interrupts? slide9/17: this is out of the scope of our investigation (i.e. we assume a systematic capability of the HW and Random HW faults mitigated by some safety mechanisms defined in the HW safety manual) slide10/17:WRT SW faults our first step is to identify the Kernel code that: - is responsible for creating process address spaces - is responsible for maintaining process address spaces - is resposible for configuring any HW that is required to support the above mentioned activities Right now we temporarily parked aside any code that is not functionally involved in the activities above and that BTW could interfere (that should be covered by the Kernel FFI activity) Agreed. We just started trying to scope the entrypoints, that I would expect to land into different subsystems. So there is a lot to study and learn and any previous investigation to look at is very helpful here I totally agree on how to structure the investigation and I am looking forward to work with you Lukas. I would suggest to get started, involve the right people, then decide if we need another WG, ok? I am afraid we passed the wrong msg here. Tools are helpful but we cannot rely on these. Even when explaining the hybrid approach we said that tools can generate a baseline to work on but human review of the code is ABSOLUTELY required to create a meaningful picture. IMO we should start according to 2a and then we can use tools to support our analysis and complete it (using tools we may figure out that we missed to analyse some aspects for example). So WDYT? If we are aligned on starting on 2a we'll follow up with a more formal sub-task proposal and we can discuss a dedicated weekly time slot to work on it... Thanks Gab
|
|
Robert Krutsch
Thanks a lot Lukas for the material, it was very insightful.
I have been thinking also about this topic a year ago and had a lot of questions and not many answers without a carefully designed HW. On top of that what was discussed by Thomas B. we have a lot of other issues: - Cache hierarchy influences; not all architectures have the same way of cache indexing and there are many failure modes there - Interconnect and DDR controller also contribute to the data-path and bring many challenges - Timing related issues across caches, interconnect and DDR are hard to mitigate and detection is typically so late that it is not helpful A careful HW design and maybe some constraints could help out. One proposal that I had back then is to constrain the execution of safety tasks to some of the cores. That would allow us to use a lot of the protection mechanisms in the MMU as we have different masters for safe and unsafe applications. It depends a lot also if the caches are virtually tagged and how the hierarchy is; I have not studied the new CPUs from ARM to see if different clusters would be needed to achieve a satisfactory isolation. //Robert On Wed, Sep 29, 2021 at 7:07 AM Lukas Bulwahn <lukas.bulwahn@...> wrote: On Wed, Sep 29, 2021 at 6:49 AM Lukas Bulwahn via lists.elisa.tech |
|