[2/3] MIPS: Use async IPIs for arch_trigger_cpumask_backtrace()
diff mbox series

Message ID -
State Accepted
Delegated to: Paul Burton
Headers show
Series
  • MIPS: Fix arch_trigger_cpumask_backtrace(), clean up output
Related show

Commit Message

Paul Burton June 22, 2018, 5:55 p.m. UTC
The current MIPS implementation of arch_trigger_cpumask_backtrace() is
broken because it attempts to use synchronous IPIs despite the fact that
it may be run with interrupts disabled.

This means that when arch_trigger_cpumask_backtrace() is invoked, for
example by the RCU CPU stall watchdog, we may:

  - Deadlock due to use of synchronous IPIs with interrupts disabled,
    causing the CPU that's attempting to generate the backtrace output
    to hang itself.

  - Not succeed in generating the desired output from remote CPUs.

  - Produce warnings about this from smp_call_function_many(), for
    example:

    [] INFO: rcu_sched detected stalls on CPUs/tasks:
    []  0-...!: (1 GPs behind) idle=ade//0 softirq=526944/526945 fqs=0
    []  1-...!: (0 ticks this GP) idle=e4a//0 softirq=547885/547885 fqs=0
    []  (detected by 2, t=2162 jiffies, g=266689, c=266688, q=33)
    [] ------------[ cut here ]------------
    [] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:416 smp_call_function_many+0x88/0x20c
    [] Modules linked in:
    [] CPU: 2 PID: 1216 Comm: sh Not tainted -gee058bb4d0c2 #2
    [] Stack : 8e09bd20 8e09bd20 8e09bd20 fffffff0   8e09bca8
    []         95b2b379 95b2b379 807a0080  0000018a 
    []            806eca74  8017e2b8 000001a0
    []           8e09baa4  808b8008 86d69080 8e09bca0
    []         8e09ad50 805e20aa   8017e2b8  801070ca
    []         ...
    [] Call Trace:
    [] [<27fde568>] show_stack+0x70/0xf0
    [] [<>] dump_stack+0xaa/0xd0
    [] [<699d671c>] __warn+0x80/0x92
    [] [<68915d41>] warn_slowpath_null+0x28/0x36
    [] [<f7c76c1c>] smp_call_function_many+0x88/0x20c
    [] [<6bbdfc2a>] arch_trigger_cpumask_backtrace+0x30/0x4a
    [] [<f845bd33>] rcu_dump_cpu_stacks+0x6a/0x98
    [] [<796e7629>] rcu_check_callbacks+0x672/0x6ac
    [] [<059b3b43>] update_process_times+0x18/0x34
    [] [<6eb94941>] tick_sched_handle.isra.5+0x26/0x38
    [] [<478d3d70>] tick_sched_timer+0x1c/0x50
    [] [<e56ea39f>] __hrtimer_run_queues+0xc6/0x226
    [] [<e88bbcae>] hrtimer_interrupt+0x88/0x19a
    [] [<6765a19e>] gic_compare_interrupt+0x2e/0x3a
    [] [<0558bf5f>] handle_percpu_devid_irq+0x78/0x168
    [] [<90c11ba2>] generic_handle_irq+0x1e/0x2c
    [] [<1b6d462c>] gic_handle_local_int+0x38/0x86
    [] [<b2ada1c7>] gic_irq_dispatch+0xa/0x14
    [] [<90c11ba2>] generic_handle_irq+0x1e/0x2c
    [] [<c7521934>] do_IRQ+0x16/0x20
    [] [<9aef3ce6>] plat_irq_dispatch+0x62/0x94
    [] [<6a94b53c>] except_vec_vi_end+0x70/0x78
    [] [<>] smp_call_function_many+0x1ba/0x20c
    [] [<54022b58>] smp_call_function+0x1e/0x2c
    [] [<ab9fc705>] flush_tlb_mm+0x2a/0x98
    [] [<0844cdd0>] tlb_flush_mmu+0x1c/0x44
    [] [<cb259b74>] arch_tlb_finish_mmu+0x26/0x3e
    [] [<1aeaaf74>] tlb_finish_mmu+0x18/0x66
    [] [<b3fce717>] exit_mmap+0x76/0xea
    [] [<c4c8a2f6>] mmput+0x80/0x11a
    [] [<a41a08f4>] do_exit+0x1f4/0x80c
    [] [<ee01cef6>] do_group_exit+0x20/0x7e
    [] [<13fa8d54>] __wake_up_parent+0x0/0x1e
    [] [<46cf89d0>] smp_call_function_many+0x1a2/0x20c
    [] [<8c21a93b>] syscall_common+0x14/0x1c
    [] ---[ end trace 02aa09da9dc52a60 ]---
    [] ------------[ cut here ]------------
    [] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:291 smp_call_function_single+0xee/0xf8
    ...

This patch switches MIPS' arch_trigger_cpumask_backtrace() to use async
IPIs & smp_call_function_single_async() in order to resolve this
problem. We ensure use of the pre-allocated call_single_data_t
structures is serialized by maintaining a cpumask indicating that
they're busy, and refusing to attempt to send an IPI when a CPU's bit is
set in this mask. This should only happen if a CPU hasn't responded to a
previous backtrace IPI - ie. if it's hung - and we print a warning to
the console in this case.

Signed-off-by: Paul Burton <>
Cc: James Hogan <>
Cc: Ralf Baechle <>
Cc: Huacai Chen <>
Cc: 
---

 arch/mips/kernel/process.c | 45 +++++++++++++++++++++++++-------------
 1 file changed, 30 insertions(+), 15 deletions(-)

Comments

Paul Burton July 9, 2018, 4:03 p.m. UTC | #1
Hi Huacai,

On Mon, Jul 09, 2018 at 11:04:32AM +0800, 陈华才 wrote:
> Should we do something to avoid parallel backtrace output?

nmi_cpu_backtrace() already takes care of that using a spinlock.

> BTW, could you please check the linux-mips configuration to not reject
> emails from lemote.com?

I'm afraid I have no control over that. Ralf, do you have any idea
what's happening there?

Thanks,
    Paul

Patch
diff mbox series

diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index d4cfeb931382..2cee3fb07243 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -29,6 +29,7 @@ 
 #include <linux/kallsyms.h>
 #include <linux/random.h>
 #include <linux/prctl.h>
+#include <linux/nmi.h>
 
 #include <asm/asm.h>
 #include <asm/bootinfo.h>
@@ -655,28 +656,42 @@  unsigned long arch_align_stack(unsigned long sp)
 	return sp & ALMASK;
 }
 
-static void arch_dump_stack(void *info)
-{
-	struct pt_regs *regs;
+static struct cpumask backtrace_csd_busy;
 
-	regs = get_irq_regs();
-
-	if (regs)
-		show_regs(regs);
-	else
-		dump_stack();
+static void handle_backtrace(void *info)
+{
+	nmi_cpu_backtrace(get_irq_regs());
+	cpumask_clear_cpu(smp_processor_id(), &backtrace_csd_busy);
 }
 
-void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
+static void raise_backtrace(cpumask_t *mask)
 {
-	long this_cpu = get_cpu();
+	static DEFINE_PER_CPU(call_single_data_t, static_csd);
+	call_single_data_t *csd;
+	int cpu;
 
-	if (cpumask_test_cpu(this_cpu, mask) && !exclude_self)
-		dump_stack();
+	for_each_cpu(cpu, mask) {
+		/*
+		 * If we previously sent an IPI to the target CPU & it hasn't
+		 * cleared its bit in the busy cpumask then it didn't handle
+		 * our previous IPI & it's not safe for us to reuse the
+		 * call_single_data_t.
+		 */
+		if (cpumask_test_and_set_cpu(cpu, &backtrace_csd_busy)) {
+			pr_warn("Unable to send backtrace IPI to CPU%u - perhaps it hung?\n",
+				cpu);
+			continue;
+		}
 
-	smp_call_function_many(mask, arch_dump_stack, NULL, 1);
+		csd = &per_cpu(static_csd, cpu);
+		csd->func = handle_backtrace;
+		smp_call_function_single_async(cpu, csd);
+	}
+}
 
-	put_cpu();
+void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
+{
+	nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace);
 }
 
 int mips_get_process_fp_mode(struct task_struct *task)