Discussion:
kern/49964: DRM/KMS panic on Radeon hardware
(too old to reply)
Aleksej Saushev
2015-06-11 02:40:12 UTC
Permalink
Number: 49964
Category: kern
Synopsis: DRM/KMS panic with Radeon hardware
Confidential: no
Severity: critical
Priority: medium
Responsible: kern-bug-people
State: open
Class: sw-bug
Submitter-Id: net
Arrival-Date: Thu Jun 11 02:40:00 +0000 2015
Release: NetBSD 7.0_BETA
System: NetBSD lithium 7.0_BETA NetBSD 7.0_BETA (GENERIC) #2: Wed Nov 19 15:12:31 MSK 2014 ***@lithium:/usr/obj/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
NetBSD 7.0_BETA i386 built from the source as of 2015-06-01 panics at boot.

savecore does not function.

Last messages are:

drm: initializing kernel modesetting (RV710 0x1002:0x9552 0x103C:0x7016)
drm: register mmio base: 0xd8400000
drm: register mmio size: 65536
DRM error in radeon_get_bios: Unable to locate a BIOS ROM
: error: Fatal error during GPU init
radeon0: unable to attach DRM: 22
panic: cnopen: no console device
Update, install new kernel, reboot.
Boot previous kernel.
r***@NetBSD.org
2015-07-18 14:55:27 UTC
Permalink
Synopsis: DRM/KMS panic with Radeon hardware

Responsible-Changed-From-To: kern-bug-people->riastradh
Responsible-Changed-By: ***@NetBSD.org
Responsible-Changed-When: Sat, 18 Jul 2015 14:55:15 +0000
Responsible-Changed-Why:
c***@SDF.ORG
2015-10-02 06:49:45 UTC
Permalink
The following reply was made to PR kern/49964; it has been noted by GNATS.

From: ***@SDF.ORG
To: gnats-***@NetBSD.org
Cc:
Subject: Re: kern/49964
Date: Thu, 1 Oct 2015 14:22:19 +0000
Submitter-Id: net
Confidential: no
Severity: critical
Priority: medium
Category: kern
Class: sw-bug
Release: NetBSD 7.0
System: NetBSD 7.0 NetBSD 7.0 (GENERIC.201509250726Z) amd64
Architecture: x86_64
Machine: amd64
Relevant GPU: ATI Radeon HD 5430
BIOS Information:
Vendor: Dell Inc.
Version: A07
NetBSD 7.0 amd64 installed from 6.1.5 using sysupgrade panics at boot.

Last messages are:
drm: initializing kernel modesetting (CEDAR 0x1002:0x68E1 0x1028:0x0466)
drm: register mmio base: 0xfbe20000
drm: register mmio base: 131072
DRM error in radeon_get_bios: Unable to locate a BIOS ROM
: error: Fatal error during GPU init
radeon0: unable to attach drm: 22
panic: cnopen: no console device
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff80281dc5 cs 8 rflags 246 7f7ff6c0cc00 ilevel
0 rsp fffe804585cad8
curlwp 0xfffffe80ba057aa0 pid 2.1 lowest kstack 0xfffffe804585a2c0
Stopped in pid 2.1 (init) at netbsd:breakpoint+0x5: leave
Install 6.1.5-amd64, use sysupgrade to get to 7.0.
userconf disable radeon resolves this issue specifically. to boot successfully I needed to disable cortemp as
well.
Taylor R Campbell
2015-10-13 00:25:32 UTC
Permalink
The following reply was made to PR kern/49964; it has been noted by GNATS.

From: Taylor R Campbell <***@NetBSD.org>
To: gnats-***@NetBSD.org, netbsd-***@NetBSD.org, gnats-***@NetBSD.org
Cc:
Subject: Re: kern/49964: DRM/KMS panic with Radeon hardware
Date: Tue, 13 Oct 2015 00:20:46 +0000

This is a multi-part message in MIME format.
--=_G90APvgBkR83Vn9vjF6aFMoy/cVA3JYh

The relevant part of the error is that the radeon driver was unable to
read the video bios. I asked asau@ privately to try to dump it from
[0xc0000, 0xe0000), and it looked like a plausible video bios to me --
so the next step is to try to find why every way radeon_get_bios tries
to get it fails.

First: can you send the output of `pcictl pci0 dump -b <bus> -d <dev>
-f <func>', where <bus>/<dev>/<func> are whatever locators the radeon
device is at?

Second: can you apply the attached patch to instrument the attempts to
read the video bios with debug prints, and try again?

--=_G90APvgBkR83Vn9vjF6aFMoy/cVA3JYh
Content-Type: text/plain; charset="ISO-8859-1"; name="radeon"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="radeon.patch"

Index: sys/external/bsd/drm2/dist/drm/radeon/radeon_bios.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /cvsroot/src/sys/external/bsd/drm2/dist/drm/radeon/radeon_bios.c,v
retrieving revision 1.4
diff -p -u -r1.4 radeon_bios.c
--- sys/external/bsd/drm2/dist/drm/radeon/radeon_bios.c 24 Jun 2015 18:23:2=
3 -0000 1.4
+++ sys/external/bsd/drm2/dist/drm/radeon/radeon_bios.c 13 Oct 2015 00:20:0=
9 -0000
@@ -56,9 +56,13 @@ static bool igp_read_bios_from_vram(stru
resource_size_t size =3D 256 * 1024; /* ??? */
#endif
=20
+ DRM_ERROR("\n");
+
if (!(rdev->flags & RADEON_IS_IGP))
- if (!radeon_card_posted(rdev))
+ if (!radeon_card_posted(rdev)) {
+ DRM_ERROR("card not posted\n");
return false;
+ }
=20
rdev->bios =3D NULL;
#ifdef __NetBSD__
@@ -66,13 +70,16 @@ static bool igp_read_bios_from_vram(stru
/* XXX Dunno what type to expect here; fill me in... */
pci_mapreg_type(rdev->pdev->pd_pa.pa_pc,
rdev->pdev->pd_pa.pa_tag, PCI_BAR(0)),
- 0, &bst, &bsh, NULL, &size))
+ 0, &bst, &bsh, NULL, &size)) {
+ DRM_ERROR("failed to map PCI BAR 0\n");
return false;
+ }
if ((size =3D=3D 0) ||
(size < 256 * 1024) ||
(bus_space_read_1(bst, bsh, 0) !=3D 0x55) ||
(bus_space_read_1(bst, bsh, 1) !=3D 0xaa) ||
((rdev->bios =3D kmalloc(size, GFP_KERNEL)) =3D=3D NULL)) {
+ DRM_ERROR("bad-looking vbios or allocation failed\n");
bus_space_unmap(bst, bsh, size);
return false;
}
@@ -113,6 +120,7 @@ static bool radeon_read_bios(struct rade
/* XXX: some cards may return 0 for rom size? ddx has a workaround */
bios =3D pci_map_rom(rdev->pdev, &size);
if (!bios) {
+ DRM_ERROR("pci_map_rom failed\n");
return false;
}
=20
@@ -130,11 +138,13 @@ static bool radeon_read_bios(struct rade
if (size =3D=3D 0 ||
bus_space_read_1(bst, bsh, 0) !=3D 0x55 ||
bus_space_read_1(bst, bsh, 1) !=3D 0xaa) {
+ DRM_ERROR("bad-looking vbios\n");
pci_unmap_rom(rdev->pdev, bios);
return false;
}
rdev->bios =3D kmalloc(size, GFP_KERNEL);
if (rdev->bios =3D=3D NULL) {
+ DRM_ERROR("allocation failed\n");
pci_unmap_rom(rdev->pdev, bios);
return false;
}

--=_G90APvgBkR83Vn9vjF6aFMoy/cVA3JYh--
r***@NetBSD.org
2015-10-13 01:07:46 UTC
Permalink
Synopsis: DRM/KMS panic with Radeon hardware

State-Changed-From-To: open->feedback
State-Changed-By: ***@NetBSD.org
State-Changed-When: Tue, 13 Oct 2015 01:07:36 +0000
State-Changed-Why:
feedback requested
c***@SDF.ORG
2015-10-17 11:03:46 UTC
Permalink
The following reply was made to PR kern/49964; it has been noted by GNATS.

From: ***@SDF.ORG
To: gnats-***@NetBSD.org
Cc:
Subject: Re: kern/49964: DRM/KMS panic with Radeon hardware
Date: Fri, 16 Oct 2015 23:49:53 +0000

Apologies for the late response and some differences - I had compiled with a slightly different kernel (a slightly
newer -current). I don't think it matters, but if something is confusing, that is why.

Thank you.



Output from kernel panic:

drm: initializing kernel modesetting (CEDAR 0x68E1 0x1028:0x466).
drm: register mmio base: 0xfbe20000
drm: register mmio size: 131072
DRM error in igp_read_bios_from_vram:
DRM error in igp_read_bios_from_vram: bad-looking vbios or allocation failed
DRM error in radeon_read_bios: pci_map_rom failed
DRM error in radeon_read_bios: pci_map_rom failed
DRM error in radeon_get_bios: Unable to locate a BIOS ROM
: error: Fatal error during GPU init
radeon0: unable to attach drm: 22
panic: cnopen: no console device
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff80281dc5 cs 8 rflags 246 cr2 7f7ff6c0cc00 ilevel
0 rsp fffffe8045a0ead8
curlwp 0xfffffe8045932a80 pid 2.1 lowest kstack 0xfffffe8045a0c2c0
Stopped in pid 2.1 (init) at netbsd:breakpoint+0x5: leave



this is the output of pcictl pci0 dump -b 001 -d 00 -f 0:

PCI configuration registers:
Common header:
0x00: 0x68e11002 0x00100007 0x03000000 0x00000010

Vendor Name: ATI Technologies (0x1002)
Device ID: 0x68e1
Command register: 0x0007
I/O space accesses: on
Memory space accesses: on
Bus mastering: on
Special cycles: off
MWI transactions: off
Palette snooping: off
Parity error checking: off
Address/data stepping: off
System error (SERR): off
Fast back-to-back transactions: off
Interrupt disable: off
Status register: 0x0010
Interrupt status: inactive
Capability List support: on
66 MHz capable: off
User Definable Features (UDF) support: off
Fast back-to-back capable: off
Data parity error detected: off
DEVSEL timing: fast (0x0)
Slave signaled Target Abort: off
Master received Target Abort: off
Master received Master Abort: off
Asserted System Error (SERR): off
Parity error detected: off
Class Name: display (0x03)
Subclass Name: VGA (0x00)
Interface: 0x00
Revision ID: 0x00
BIST: 0x00
Header Type: 0x00 (0x00)
Latency Timer: 0x00
Cache Line Size: 64bytes (0x10)

Type 0 ("normal" device) header:
0x10: 0xc000000c 0x00000000 0xfbe20004 0x00000000
0x20: 0x0000e001 0x00000000 0x00000000 0x04661028
0x30: 0xfbe00000 0x00000050 0x00000000 0x0000010b

Base address register at 0x10
type: 64-bit prefetchable memory
base: 0x00000000c0000000, not sized
Base address register at 0x18
type: 64-bit nonprefetchable memory
base: 0x00000000fbe20000, not sized
Base address register at 0x20
type: i/o
base: 0x0000e000, not sized
Base address register at 0x24
not implemented(?)
Cardbus CIS Pointer: 0x00000000
Subsystem vendor ID: 0x1028
Subsystem ID: 0x0466
Expansion ROM Base Address: 0xfbe00000
Capability list pointer: 0x50
Reserved @ 0x38: 0x00000000
Maximum Latency: 0x00
Minimum Grant: 0x00
Interrupt pin: 0x01 (pin A)
Interrupt line: 0x0b

Capability register at 0x50
type: 0x01 (Power Management)
Capability register at 0x58
type: 0x10 (PCI Express)
Capability register at 0xa0
type: 0x05 (MSI)

PCI Power Management Capabilities Register
Capabilities register: 0x0603
Version: 1.2
PME# clock: off
Device specific initialization: off
3.3V auxiliary current: self-powered
D1 power management state support: on
D2 power management state support: on
PME# support D0: off
PME# support D1: off
PME# support D2: off
PME# support D3 hot: off
PME# support D3 cold: off
Control/status register: 0x0000
Power state: D0
PCI Express reserved: off
No soft reset: off
PME# assertion: disabled
PME# status: off
Bridge Support Extensions register: 0x00
B2/B3 support: off
Bus Power/Clock Control Enable: off
Data register: 0x00

PCI Message Signaled Interrupt
Message Control register: 0x0080
MSI Enabled: off
Multiple Message Capable: no (1 vector)
Multiple Message Enabled: off (1 vector)
64 Bit Address Capable: on
Per-Vector Masking Capable: off
Message Address (lower) register: 0x00000000
Message Address (upper) register: 0x00000000
Message Data register: 0x00000000

PCI Express Capabilities Register
Capability register: 0012
Capability version: 2
Device type: Legacy PCI Express Endpoint device
Slot implemented: off
Interrupt Message Number: 0
Device Capabilities Register: 0x00008fa1
Max Payload Size Supported: 256 bytes max
Phantom Functions Supported: not available
Extended Tag Field Supported: 8bit
Endpoint L0 Acceptable Latency: 2us - 4us
Endpoint L1 Acceptable Latency: More than 64us
Attention Button Present: off
Attention Indicator Present: off
Power Indicator Present: off
Role-Based Error Report: on
Captured Slot Power Limit Value: 0
Captured Slot Power Limit Scale: 0
Function-Level Reset Capability: off
Device Control Register: 0x0800
Correctable Error Reporting Enable: off
Non Fatal Error Reporting Enable: off
Fatal Error Reporting Enable: off
Unsupported Request Reporting Enable: off
Enable Relaxed Ordering: off
Max Payload Size: 128 byte
Extended Tag Field Enable: off
Phantom Functions Enable: off
Aux Power PM Enable: off
Enable No Snoop: on
Max Read Request Size: 128 byte
Device Status Register: 0x0000
Correctable Error Detected: off
Non Fatal Error Detected: off
Fatal Error Detected: off
Unsupported Request Detected: off
Aux Power Detected: off
Transaction Pending: off
Link Capabilities Register: 0x00000d01
Maximum Link Speed: 2.5GT/s
Maximum Link Width: x16 lanes
Active State PM Support: L0s and L1 supported
L0 Exit Latency: Less than 64ns
L1 Exit Latency: Less than 1us
Port Number: 0
Clock Power Management: off
Surprise Down Error Report: off
Data Link Layer Link Active: off
Link BW Notification Capable: off
ASPM Optionally Compliance: off
Link Control Register: 0x0043
Active State PM Control: L0s and L1 Entry Enabled
Read Completion Boundary Control: 64bytes
Link Disable: off
Retrain Link: off
Common Clock Configuration: on
Extended Synch: off
Enable Clock Power Management: off
Hardware Autonomous Width Disable: off
Link Bandwidth Management Interrupt Enable: off
Link Autonomous Bandwidth Interrupt Enable: off
Link Status Register: 0x1101
Negotiated Link Speed: 2.5GT/s
Negotiated Link Width: x16 lanes
Training Error: off
Link Training: off
Slot Clock Configuration: on
Data Link Layer Link Active: off
Link Bandwidth Management Status: off
Link Autonomous Bandwidth Status: off
Device Capabilities 2: 0x00000000
Completion Timeout Ranges Supported: 0
Completion Timeout Disable Supported: off
ARI Forwarding Supported: off
AtomicOp Routing Supported: off
32bit AtomicOp Completer Supported: off
64bit AtomicOp Completer Supported: off
128-bit CAS Completer Supported: off
No RO-enabled PR-PR passing: off
LTR Mechanism Supported: off
TPH Completer Supported: 0
OBFF Supported: Not supported
Extended Fmt Field Supported: off
End-End TLP Prefix Supported: off
Max End-End TLP Prefixes: 0
Device Control 2: 0x0000
Completion Timeout Value: 50us to 50ms
Completion Timeout Disabled: off
ARI Forwarding Enabled: off
AtomicOp Rquester Enabled: off
AtomicOp Egress Blocking: off
IDO Request Enabled: off
IDO Completion Enabled: off
LTR Mechanism Enabled: off
OBFF: Disabled
End-End TLP Prefix Blocking on: off
Link Capabilities 2: 0x00000000
Supported Link Speed Vector:
Crosslink Supported: off
Link Control 2: 0x0001
Target Link Speed: 2.5GT/s
Enter Compliance Enabled: off
HW Autonomous Speed Disabled: off
Selectable De-emphasis: off
Transmit Margin: 0
Enter Modified Compliance: off
Compliance SOS: off
Compliance Present/De-emphasis: 0
Link Status 2: 0x0000
Current De-emphasis Level: off
Equalization Complete: off
Equalization Phase 1 Successful: off
Equalization Phase 2 Successful: off
Equalization Phase 3 Successful: off
Link Equalization Request: off

Device-dependent header:
0x40: 0x00000000 0x00000000 0x00000000 0x04661028
0x50: 0x06035801 0x00000000 0x0012a010 0x00008fa1
0x60: 0x00000800 0x00000d01 0x11010043 0x00000000
0x70: 0x00000000 0x00000000 0x00000000 0x00000000
0x80: 0x00000000 0x00000000 0x00000001 0x00000000
0x90: 0x00000000 0x00000000 0x00000000 0x00000000
0xa0: 0x00800005 0x00000000 0x00000000 0x00000000
0xb0: 0x00000000 0x00000000 0x00000000 0x00000000
0xc0: 0x00000000 0x00000000 0x00000000 0x00000000
0xd0: 0x00000000 0x00000000 0x00000000 0x00000000
0xe0: 0x00000000 0x00000000 0x00000000 0x00000000
0xf0: 0x00000000 0x00000000 0x00000000 0x00000000
Taylor R Campbell
2015-10-22 23:10:15 UTC
Permalink
The following reply was made to PR kern/49964; it has been noted by GNATS.

From: Taylor R Campbell <***@NetBSD.org>
To: gnats-***@NetBSD.org, netbsd-***@NetBSD.org, gnats-***@NetBSD.org
Cc:
Subject: Re: kern/49964: DRM/KMS panic on Radeon hardware
Date: Thu, 22 Oct 2015 23:08:30 +0000

A few iterations of debugging prints later, it looks like

(a) the machine has a good-looking vbios at [0xc0000, 0xdffff], shown
by `dd if=/dev/mem of=/tmp/vbios bs=1 iseek=786432 count=131072', but

(b) for some reason, pci_find_rom reads 0x00 0x00 from 0xc0000 0xc0001
instead of the expected magic numbers 0x55 0xaa, when called via
pci_map_rom/pci_map_rom_md in sys/external/bsd/drm2/include/linux/pci.h.

I'm at a loss about why this might be.

I don't think it should be necessary to enable the ROM address decoder
in the PCI BAR in order to read from 0xc0000 -- that should work even
with pre-PCI code, I think.
Taylor R Campbell
2016-01-17 01:40:09 UTC
Permalink
The following reply was made to PR kern/49964; it has been noted by GNATS.

From: Taylor R Campbell <***@NetBSD.org>
To: gnats-***@NetBSD.org
Cc:
Subject: Re: kern/49964: DRM/KMS panic with Radeon hardware
Date: Sun, 17 Jan 2016 01:36:22 +0000

Some idiot, it seems, didn't consider the possibility that the PCI ROM
BAR might be populated with an address that does not actually point to
a valid VBIOS, and wrote 0xc0000 fallback logic only in the case of an
unpopulated ROM BAR.

That idiot, of course, is me! Fix incoming.
Taylor R Campbell
2016-01-17 01:45:07 UTC
Permalink
The following reply was made to PR kern/49964; it has been noted by GNATS.

From: "Taylor R Campbell" <***@netbsd.org>
To: gnats-***@gnats.NetBSD.org
Cc:
Subject: PR/49964 CVS commit: src/sys/external/bsd/drm2/include/linux
Date: Sun, 17 Jan 2016 01:40:39 +0000

Module Name: src
Committed By: riastradh
Date: Sun Jan 17 01:40:39 UTC 2016

Modified Files:
src/sys/external/bsd/drm2/include/linux: pci.h

Log Message:
Use PCI ROM MD fallback if PCI ROM BAR points to invalid ROM.

We previously applied the PCI ROM MD fallback only if the PCI ROM BAR
was altogether unpopulated. Some Radeon devices seem to have a
populated PCI ROM BAR pointing at a bogus ROM, while 0xc0000 works
fine.

Fixes at least one manifestation of PR kern/49964.


To generate a diff of this commit:
cvs rdiff -u -r1.21 -r1.22 src/sys/external/bsd/drm2/include/linux/pci.h

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Aleksej Saushev
2016-01-20 21:45:07 UTC
Permalink
The following reply was made to PR kern/49964; it has been noted by GNATS.

From: Aleksej Saushev <***@inbox.ru>
To: gnats-***@NetBSD.org
Cc:
Subject: Re: kern/49964 (DRM/KMS panic with Radeon hardware)
Date: Thu, 21 Jan 2016 00:39:36 +0300

After I merge revision 1.22 of sys/external/bsd/drm2/include/linux/pci.h,
NetBSD 7.0_STABLE kernel works for me on my system. DRM seems to work too.
Soren Jacobsen
2016-01-27 00:05:08 UTC
Permalink
The following reply was made to PR kern/49964; it has been noted by GNATS.

From: "Soren Jacobsen" <***@netbsd.org>
To: gnats-***@gnats.NetBSD.org
Cc:
Subject: PR/49964 CVS commit: [netbsd-7] src/sys/external/bsd/drm2/include/linux
Date: Wed, 27 Jan 2016 00:01:07 +0000

Module Name: src
Committed By: snj
Date: Wed Jan 27 00:01:07 UTC 2016

Modified Files:
src/sys/external/bsd/drm2/include/linux [netbsd-7]: pci.h

Log Message:
Pull up following revision(s) (requested by riastradh in ticket #1077):
sys/external/bsd/drm2/include/linux/pci.h: revision 1.22
Use PCI ROM MD fallback if PCI ROM BAR points to invalid ROM.
We previously applied the PCI ROM MD fallback only if the PCI ROM BAR
was altogether unpopulated. Some Radeon devices seem to have a
populated PCI ROM BAR pointing at a bogus ROM, while 0xc0000 works
fine.
Fixes at least one manifestation of PR kern/49964.


To generate a diff of this commit:
cvs rdiff -u -r1.7.2.7 -r1.7.2.8 \
src/sys/external/bsd/drm2/include/linux/pci.h

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
m***@NetBSD.org
2016-09-01 13:13:55 UTC
Permalink
Synopsis: DRM/KMS panic with Radeon hardware

State-Changed-From-To: feedback->closed
State-Changed-By: ***@NetBSD.org
State-Changed-When: Thu, 01 Sep 2016 13:13:49 +0000
State-Changed-Why:
Fixed in netbsd-7 and netbsd-current.

Loading...