Discussion:
[Crash-utility] [PATCH] x86_64: Make the conversion between 4level and 5level paging automatically
Dou Liyang
2018-07-06 09:01:57 UTC
Permalink
Currently, Crash only enable support for kernel-only 5-level page tables by
entering the command line option "--machdep vm=5level". Since Linux 4.17,
the Linux kernel can be both 4level and 5level page tables. This command line
can't work well for this.

Using the "pgtable_l5_enabled" got from vmcore to detect whether the kernel
proper for 5 level page tables automatically.

Signed-off-by: Dou Liyang <***@cn.fujitsu.com>
---
x86_64.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/x86_64.c b/x86_64.c
index 6d1ae2f..be6164b 100644
--- a/x86_64.c
+++ b/x86_64.c
@@ -203,6 +203,10 @@ x86_64_init(int when)
machdep->machspec->kernel_image_size = dtol(string, QUIET, NULL);
free(string);
}
+ if ((string = pc->read_vmcoreinfo("NUMBER(pgtable_l5_enabled)"))) {
+ machdep->flags |= VM_5LEVEL;
+ free(string);
+ }
if (SADUMP_DUMPFILE() || QEMU_MEM_DUMP_NO_VMCOREINFO() ||
VMSS_DUMPFILE())
/* Need for calculation of kaslr_offset and phys_base */
--
2.14.3
Dave Anderson
2018-07-06 13:45:14 UTC
Permalink
----- Original Message -----
Post by Dou Liyang
Currently, Crash only enable support for kernel-only 5-level page tables by
entering the command line option "--machdep vm=5level". Since Linux 4.17,
the Linux kernel can be both 4level and 5level page tables. This command line
can't work well for this.
Using the "pgtable_l5_enabled" got from vmcore to detect whether the kernel
proper for 5 level page tables automatically.
Hello Dou,

Presumably by the time arch_crash_save_vmcoreinfo calls pgtable_l5_enabled(),
things have been initialized up appropriately, and so this should work OK for
kdump-generated vmcores. But have you looked into how this should be accomplished
for for live systems? Since kernel commit 51be1335 reverts __pgtable_l5_enabled
from being __initdata to __ro_after_init, would it be as simple as just reading
__pgtable_l5_enabled at POST_RELOC time?

Thanks,
Dave
Post by Dou Liyang
---
x86_64.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/x86_64.c b/x86_64.c
index 6d1ae2f..be6164b 100644
--- a/x86_64.c
+++ b/x86_64.c
@@ -203,6 +203,10 @@ x86_64_init(int when)
machdep->machspec->kernel_image_size = dtol(string, QUIET, NULL);
free(string);
}
+ if ((string = pc->read_vmcoreinfo("NUMBER(pgtable_l5_enabled)"))) {
+ machdep->flags |= VM_5LEVEL;
+ free(string);
+ }
if (SADUMP_DUMPFILE() || QEMU_MEM_DUMP_NO_VMCOREINFO() ||
VMSS_DUMPFILE())
/* Need for calculation of kaslr_offset and phys_base */
--
2.14.3
Dou Liyang
2018-07-09 02:53:08 UTC
Permalink
Dear Dave,
Post by Dave Anderson
----- Original Message -----
Post by Dou Liyang
Currently, Crash only enable support for kernel-only 5-level page tables by
entering the command line option "--machdep vm=5level". Since Linux 4.17,
the Linux kernel can be both 4level and 5level page tables. This command line
can't work well for this.
Using the "pgtable_l5_enabled" got from vmcore to detect whether the kernel
proper for 5 level page tables automatically.
Hello Dou,
Presumably by the time arch_crash_save_vmcoreinfo calls pgtable_l5_enabled(),
things have been initialized up appropriately, and so this should work OK for
kdump-generated vmcores. But have you looked into how this should be accomplished
for for live systems? Since kernel commit 51be1335 reverts __pgtable_l5_enabled
I tested in live system, it didn't work, need use the "--machdep
vm=5level" like before.
Post by Dave Anderson
from being __initdata to __ro_after_init, would it be as simple as just reading
__pgtable_l5_enabled at POST_RELOC time?
Yes, I agree, but, how can we read the '__pgtable_l5_enabled' in
crash. Is there a ready-made interface such as symbol_value() for SYMBOL
values?

And seems read at POST_RELOC time is late, it should be earlier than
PRE_GDB.

Thanks,
dou.
Post by Dave Anderson
Thanks,
Dave
Post by Dou Liyang
---
x86_64.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/x86_64.c b/x86_64.c
index 6d1ae2f..be6164b 100644
--- a/x86_64.c
+++ b/x86_64.c
@@ -203,6 +203,10 @@ x86_64_init(int when)
machdep->machspec->kernel_image_size = dtol(string, QUIET, NULL);
free(string);
}
+ if ((string = pc->read_vmcoreinfo("NUMBER(pgtable_l5_enabled)"))) {
+ machdep->flags |= VM_5LEVEL;
+ free(string);
+ }
if (SADUMP_DUMPFILE() || QEMU_MEM_DUMP_NO_VMCOREINFO() ||
VMSS_DUMPFILE())
/* Need for calculation of kaslr_offset and phys_base */
--
2.14.3
Dave Anderson
2018-07-09 14:20:21 UTC
Permalink
----- Original Message -----
Post by Dou Liyang
Dear Dave,
Post by Dave Anderson
----- Original Message -----
Post by Dou Liyang
Currently, Crash only enable support for kernel-only 5-level page tables by
entering the command line option "--machdep vm=5level". Since Linux 4.17,
the Linux kernel can be both 4level and 5level page tables. This command
line can't work well for this.
Using the "pgtable_l5_enabled" got from vmcore to detect whether the kernel
proper for 5 level page tables automatically.
Hello Dou,
Presumably by the time arch_crash_save_vmcoreinfo calls pgtable_l5_enabled(),
things have been initialized up appropriately, and so this should work OK for
kdump-generated vmcores. But have you looked into how this should be accomplished
for for live systems? Since kernel commit 51be1335 reverts __pgtable_l5_enabled
I tested in live system, it didn't work, need use the "--machdep vm=5level" like before.
Post by Dave Anderson
from being __initdata to __ro_after_init, would it be as simple as just reading
__pgtable_l5_enabled at POST_RELOC time?
Yes, I agree, but, how can we read the '__pgtable_l5_enabled' in
crash. Is there a ready-made interface such as symbol_value() for SYMBOL
values?
Yes, symbol_value() will work correctly when machdep_init(POST_RELOC)
gets called.
Post by Dou Liyang
And seems read at POST_RELOC time is late, it should be earlier than
PRE_GDB.
Since the "__pgtable_l5_enabled" symbol is a static data symbol located
in the __START_KERNEL_map region, x86_64_VTOP() only needs the kernel's
"phys_base" value in order to translate the symbol value into a
physical address:

ulong x86_64_VTOP(ulong vaddr)
{
if (vaddr >= __START_KERNEL_map)
return ((vaddr) - (ulong)__START_KERNEL_map + machdep->machspec->phys_base);
else
return ((vaddr) - PAGE_OFFSET);
}

So if the contents of "__pgtable_l5_enabled" is all that is needed,
I think you can do something like:

case POST_RELOC:
+ if (!(machdep->flags & VM_5LEVEL) &&
+ kernel_symbol_exists("__pgtable_l5_enabled")) {
+ int l5_enabled;
+ readmem(symbol_value("__pgtable_l5_enabled"), KVADDR,
+ &l5_enabled, sizeof(int), "__pgtable_l5_enabled",
+ FAULT_ON_ERROR);
+
+ if (l5_enabled) {
+ ... execute the relevant section from PRE_GDB ...
+ }

which would be this section from PRE_GDB:

case VM_5LEVEL:
machdep->machspec->userspace_top = USERSPACE_TOP_5LEVEL;
machdep->machspec->page_offset = PAGE_OFFSET_5LEVEL;
machdep->machspec->vmalloc_start_addr = VMALLOC_START_ADDR_5LEVEL;
machdep->machspec->vmalloc_end = VMALLOC_END_5LEVEL;
machdep->machspec->modules_vaddr = MODULES_VADDR_5LEVEL;
machdep->machspec->modules_end = MODULES_END_5LEVEL;
machdep->machspec->vmemmap_vaddr = VMEMMAP_VADDR_5LEVEL;
machdep->machspec->vmemmap_end = VMEMMAP_END_5LEVEL;
if (symbol_exists("vmemmap_populate"))
machdep->flags |= VMEMMAP;
machdep->machspec->physical_mask_shift = __PHYSICAL_MASK_SHIFT_5LEVEL;
machdep->machspec->pgdir_shift = PGDIR_SHIFT_5LEVEL;
machdep->machspec->ptrs_per_pgd = PTRS_PER_PGD_5LEVEL;
if ((machdep->machspec->p4d = (char *)malloc(PAGESIZE())) == NULL)
error(FATAL, "cannot malloc p4d space.");
machdep->machspec->last_p4d_read = 0;
machdep->uvtop = x86_64_uvtop_level4; /* 5-level is optional per-task */
}
machdep->kvbase = (ulong)PAGE_OFFSET;
machdep->identity_map_base = (ulong)PAGE_OFFSET;

The only things that I can think of that might be a problem is the
readmem() of "__pgtable_l5_enabled" will need to get by this part
of x86_64_kvtop() in order to use x86_64_VTOP():

if (!IS_VMALLOC_ADDR(kvaddr)) {
*paddr = x86_64_VTOP(kvaddr);
if (!verbose)
return TRUE;
}

where IS_VMALLOC_ADDR() would still be using the 4-level addresses.
But that could be worked around some way.

Can you give that a test?

Thanks,
Dave
Post by Dou Liyang
Thanks,
dou.
Post by Dave Anderson
Thanks,
Dave
Post by Dou Liyang
---
x86_64.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/x86_64.c b/x86_64.c
index 6d1ae2f..be6164b 100644
--- a/x86_64.c
+++ b/x86_64.c
@@ -203,6 +203,10 @@ x86_64_init(int when)
machdep->machspec->kernel_image_size = dtol(string, QUIET, NULL);
free(string);
}
+ if ((string = pc->read_vmcoreinfo("NUMBER(pgtable_l5_enabled)"))) {
+ machdep->flags |= VM_5LEVEL;
+ free(string);
+ }
if (SADUMP_DUMPFILE() || QEMU_MEM_DUMP_NO_VMCOREINFO() ||
VMSS_DUMPFILE())
/* Need for calculation of kaslr_offset and phys_base */
--
2.14.3
Dou Liyang
2018-07-10 02:42:30 UTC
Permalink
Dear Dave,

At 07/09/2018 10:20 PM, Dave Anderson wrote:
[...]
Post by Dave Anderson
Since the "__pgtable_l5_enabled" symbol is a static data symbol located
in the __START_KERNEL_map region, x86_64_VTOP() only needs the kernel's
"phys_base" value in order to translate the symbol value into a
ulong x86_64_VTOP(ulong vaddr)
{
if (vaddr >= __START_KERNEL_map)
return ((vaddr) - (ulong)__START_KERNEL_map + machdep->machspec->phys_base);
else
return ((vaddr) - PAGE_OFFSET);
}
So if the contents of "__pgtable_l5_enabled" is all that is needed,
+ if (!(machdep->flags & VM_5LEVEL) &&
+ kernel_symbol_exists("__pgtable_l5_enabled")) {
+ int l5_enabled;
+ readmem(symbol_value("__pgtable_l5_enabled"), KVADDR,
+ &l5_enabled, sizeof(int), "__pgtable_l5_enabled",
+ FAULT_ON_ERROR);
+
+ if (l5_enabled) {
+ ... execute the relevant section from PRE_GDB ...
+ }
machdep->machspec->userspace_top = USERSPACE_TOP_5LEVEL;
machdep->machspec->page_offset = PAGE_OFFSET_5LEVEL;
machdep->machspec->vmalloc_start_addr = VMALLOC_START_ADDR_5LEVEL;
machdep->machspec->vmalloc_end = VMALLOC_END_5LEVEL;
machdep->machspec->modules_vaddr = MODULES_VADDR_5LEVEL;
machdep->machspec->modules_end = MODULES_END_5LEVEL;
machdep->machspec->vmemmap_vaddr = VMEMMAP_VADDR_5LEVEL;
machdep->machspec->vmemmap_end = VMEMMAP_END_5LEVEL;
if (symbol_exists("vmemmap_populate"))
machdep->flags |= VMEMMAP;
machdep->machspec->physical_mask_shift = __PHYSICAL_MASK_SHIFT_5LEVEL;
machdep->machspec->pgdir_shift = PGDIR_SHIFT_5LEVEL;
machdep->machspec->ptrs_per_pgd = PTRS_PER_PGD_5LEVEL;
if ((machdep->machspec->p4d = (char *)malloc(PAGESIZE())) == NULL)
error(FATAL, "cannot malloc p4d space.");
machdep->machspec->last_p4d_read = 0;
machdep->uvtop = x86_64_uvtop_level4; /* 5-level is optional per-task */
}
machdep->kvbase = (ulong)PAGE_OFFSET;
machdep->identity_map_base = (ulong)PAGE_OFFSET;
The only things that I can think of that might be a problem is the
readmem() of "__pgtable_l5_enabled" will need to get by this part
if (!IS_VMALLOC_ADDR(kvaddr)) {
*paddr = x86_64_VTOP(kvaddr);
if (!verbose)
return TRUE;
}
where IS_VMALLOC_ADDR() would still be using the 4-level addresses.
But that could be worked around some way.
AFAIC, It doesn't matter, here, due to the vt->vmalloc_start is 0 at
this time, we always use the x86_64_VTOP().

if (!vt->vmalloc_start) {
*paddr = x86_64_VTOP(kvaddr);
return TRUE;
}

if (!IS_VMALLOC_ADDR(kvaddr)) {
*paddr = x86_64_VTOP(kvaddr);
if (!verbose)
return TRUE;
}
Post by Dave Anderson
Can you give that a test?
Yes, I have tested this both in KDump and virsh dump cases. It can work
well. I will send v2 patch for you.

Thanks,
dou.

Loading...