Discussion:
iwm performance (was: Re: how would you troubleshoot your wifi?)
(too old to reply)
Stefan Sperling
2016-07-22 09:36:46 UTC
Permalink
sorry, my response was not precise - the "fatal" error is gone now but the
observed performance problems are still there.
I've already been told about iwm performance regressions compared to 5.9,
so I'd like to make a statement (not just directed at you, Andreas, but
at everyone).

Recently, I've been focusing on improving wireless stability after many
reports of lag, dropped links, and similar problems ever since 11n support
was introduced. This effort is still on-going, since I am still unable to
reproduce some of the reported issues. If such fixes end up decreasing
performance in some use cases then I'm entirely fine with that.

One possibility is that perceived performance drops are a side effect of
frame protection we've enabled. This may show up as a performance drop for
users which are alone with their AP and never see interference (so frame
protection doesn't buy them anything, it just adds overhead).
Many users are not alone with their AP but share a channel with a dozen
other APs or so and frame protection _really_ helps them. In the most
extreme cases (which I've reproduced with help from phessler@) these
users cannot use wifi at all without frame protection (TCP stalls).
To get an idea about the overhead added by RTS/CTS, see
http://www.testequipmentdepot.com/flukenetworks/pdf/802.11n-compatibility.pdf
(When reading this, keep in mind we send at MCS 7 max, without aggregation.)

In the best iwm performance regression report I've received so far, the
reporter tracked the regression down to a particular commit (r1.86 if_iwm.c).
Backing out that commit restores performance to 5.9 levels for this user.
But this commit fixed an unrelated problem, which was that IPv6 autoconf and
ARP briefly stopped working in -current after we upgraded iwm's firmware.
I don't understand how this relates. It may involve invisible details handled
within the magic firmware, or it may be a driver bug, or prior performance
levels may have been a side effect of a real stability problem. In any case,
I won't back out this commit to restore performance for one user if backing
out that commit means that other known bugs will come back.

More generally speaking, given that our 11n implementation is still in its
infancy, and doesn't yet use any of the new features which are supposed to
vastly increase throughput, it is premature to complain about performance.
For now, stability gets priority.
David Dahlberg
2016-07-22 13:18:28 UTC
Permalink
Post by Stefan Sperling
I've already been told about iwm performance regressions compared to
5.9,
so I'd like to make a statement (not just directed at you, Andreas,
but
at everyone).
JFYI: A temporary workaround which works for me (on a X1C3) is disabling
802.11n with "ifconfig mode".
Andreas Bartelt
2016-07-24 11:09:26 UTC
Permalink
Post by Stefan Sperling
sorry, my response was not precise - the "fatal" error is gone now but the
observed performance problems are still there.
...
Post by Stefan Sperling
In the best iwm performance regression report I've received so far, the
reporter tracked the regression down to a particular commit (r1.86 if_iwm.c).
Backing out that commit restores performance to 5.9 levels for this user.
But this commit fixed an unrelated problem, which was that IPv6 autoconf and
ARP briefly stopped working in -current after we upgraded iwm's firmware.
I don't understand how this relates. It may involve invisible details handled
within the magic firmware, or it may be a driver bug, or prior performance
levels may have been a side effect of a real stability problem. In any case,
I won't back out this commit to restore performance for one user if backing
out that commit means that other known bugs will come back.
More generally speaking, given that our 11n implementation is still in its
infancy, and doesn't yet use any of the new features which are supposed to
vastly increase throughput, it is premature to complain about performance.
For now, stability gets priority.
Please don't get me wrong, my mail was not meant to be a complaint at
all. While tracking current (I think it was shortly before or after 5.9
had been released) I've been observing some serious stability problems
with regard to wireless for some time -- not only regarding iwm(4) on a
Lenovo x250 as wireless client but also on the hostap side (ral(4)
obviously in 11g mode and also running on current). The hostap box
crashed multiple times a day and had to be rebooted. These problems are
gone now, i.e., both sides don't crash anymore.

However, the wireless link via iwm(4) is currently almost unusable.
Overall throughput for multiple tcp connections typically between 0 and
1 Mbit/s but mostly on the lower end, i.e., 0.

The "fatal firmware error" problem doesn't seem to be resolved - it just
doesn't occur at every boot (see attached dmesg from yesterday's
current). The bad throughput seems to be independent from this error
message.

I could verify that an old x61s with wpi(4) currently performs
considerably better (throughput in the same setting is more stable at
about 175 Kbits/s). Consequently, the ral(4) interface in 11g hostap
mode at least doesn't seem to be the primary problem. Nevertheless, I've
also observed that the presence of the x250 laptop sometimes also kills
throughput of other wireless clients (iphone, ipad etc) - so I'm not
100% sure, i.e., the problems could at least be partially related to
ral(4) in 11g hostap mode.

Btw, can you recommend a (commercial or open source) wireless access
point which is known to work well with iwm(4) in 11n mode on current?

Best regards
Andreas
OpenBSD 6.0 (GENERIC.MP) #0: Sat Jul 23 09:03:23 CEST 2016
***@obsd.bartelt.name:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 8277159936 (7893MB)
avail mem = 8021778432 (7650MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xccbfd000 (64 entries)
bios0: vendor LENOVO version "N10ET38W (1.17 )" date 08/20/2015
bios0: LENOVO 20CMCTO1WW
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP ASF! HPET ECDT APIC MCFG SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT PCCT SSDT UEFI MSDM BATB FPDT UEFI DMAR
acpi0: wakeup devices LID_(S4) SLPB(S3) IGBE(S4) EXP2(S4) XHCI(S3) EHC1(S3)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 14318179 Hz
acpiec0 at acpi0
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz, 798.28 MHz
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,RDSEED,ADX,SMAP,PT,SENSOR,ARAT
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz, 798.15 MHz
cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,RDSEED,ADX,SMAP,PT,SENSOR,ARAT
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 1, core 0, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz, 798.15 MHz
cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,RDSEED,ADX,SMAP,PT,SENSOR,ARAT
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 0, core 1, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz, 798.15 MHz
cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,RDSEED,ADX,SMAP,PT,SENSOR,ARAT
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 1, core 1, package 0
ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 40 pins
acpimcfg0 at acpi0 addr 0xf8000000, bus 0-63
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus -1 (PEG_)
acpiprt2 at acpi0: bus 2 (EXP1)
acpiprt3 at acpi0: bus 3 (EXP2)
acpiprt4 at acpi0: bus -1 (EXP3)
acpicpu0 at acpi0: C3(***@233 ***@0x40), C2(***@148 ***@0x33), C1(***@1 mwait.1), PSS
acpicpu1 at acpi0: C3(***@233 ***@0x40), C2(***@148 ***@0x33), C1(***@1 mwait.1), PSS
acpicpu2 at acpi0: C3(***@233 ***@0x40), C2(***@148 ***@0x33), C1(***@1 mwait.1), PSS
acpicpu3 at acpi0: C3(***@233 ***@0x40), C2(***@148 ***@0x33), C1(***@1 mwait.1), PSS
acpipwrres0 at acpi0: PUBS, resource for XHCI, EHC1
acpipwrres1 at acpi0: NVP3, resource for PEG_
acpipwrres2 at acpi0: NVP2, resource for PEG_
acpitz0 at acpi0: critical temperature is 128 degC
acpibtn0 at acpi0: LID_
acpibtn1 at acpi0: SLPB
"LEN0071" at acpi0 not configured
"LEN0046" at acpi0 not configured
acpibat0 at acpi0: BAT0 model "45N1113" serial 473 type LION oem "LGC"
acpibat1 at acpi0: BAT1 model "45N1738" serial 1842 type LION oem "LGC"
acpiac0 at acpi0: AC unit offline
acpithinkpad0 at acpi0
"PNP0C14" at acpi0 not configured
"PNP0C14" at acpi0 not configured
"PNP0C14" at acpi0 not configured
"INT340F" at acpi0 not configured
acpivideo0 at acpi0: VID_
acpivout at acpivideo0 not configured
acpivideo1 at acpi0: VID_
cpu0: Enhanced SpeedStep 798 MHz: speeds: 2601, 2600, 2500, 2300, 2100, 2000, 1800, 1700, 1500, 1400, 1200, 1100, 900, 800, 600, 500 MHz
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Core 5G Host" rev 0x09
inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 5500" rev 0x09
drm0 at inteldrm0
inteldrm0: msi
inteldrm0: 1920x1080
wsdisplay0 at inteldrm0 mux 1: console (std, vt100 emulation)
wsdisplay0: screen 1-5 added (std, vt100 emulation)
azalia0 at pci0 dev 3 function 0 "Intel Core 5G HD Audio" rev 0x09: msi
xhci0 at pci0 dev 20 function 0 "Intel 9 Series xHCI" rev 0x03: msi
usb0 at xhci0: USB revision 3.0
uhub0 at usb0 "Intel xHCI root hub" rev 3.00/1.00 addr 1
"Intel 9 Series MEI" rev 0x03 at pci0 dev 22 function 0 not configured
em0 at pci0 dev 25 function 0 "Intel I218-LM" rev 0x03: msi, address 50:7b:9d:90:08:06
azalia1 at pci0 dev 27 function 0 "Intel 9 Series HD Audio" rev 0x03: msi
azalia1: codecs: Realtek ALC292
audio0 at azalia1
ppb0 at pci0 dev 28 function 0 "Intel 9 Series PCIE" rev 0xe3: msi
pci1 at ppb0 bus 2
rtsx0 at pci1 dev 0 function 0 "Realtek RTS5227 Card Reader" rev 0x01: msi
sdmmc0 at rtsx0: 4-bit
ppb1 at pci0 dev 28 function 1 "Intel 9 Series PCIE" rev 0xe3: msi
pci2 at ppb1 bus 3
iwm0 at pci2 dev 0 function 0 "Intel Dual Band Wireless AC 7265" rev 0x99, msi
ehci0 at pci0 dev 29 function 0 "Intel 9 Series USB" rev 0x03: apic 2 int 23
usb1 at ehci0: USB revision 2.0
uhub1 at usb1 "Intel EHCI root hub" rev 2.00/1.00 addr 1
pcib0 at pci0 dev 31 function 0 "Intel 9 Series LPC" rev 0x03
ahci0 at pci0 dev 31 function 2 "Intel 9 Series AHCI" rev 0x03: msi, AHCI 1.3
ahci0: port 0: 6.0Gb/s
scsibus1 at ahci0: 32 targets
sd0 at scsibus1 targ 0 lun 0: <ATA, SAMSUNG MZ7LN512, EMT0> SCSI3 0/direct fixed naa.5002538d00000000
sd0: 488386MB, 512 bytes/sector, 1000215216 sectors, thin
ichiic0 at pci0 dev 31 function 3 "Intel 9 Series SMBus" rev 0x03: apic 2 int 18
iic0 at ichiic0
pchtemp0 at pci0 dev 31 function 6 "Intel 9 Series Thermal" rev 0x03
isa0 at pcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
wsmouse1 at pms0 mux 0
pms0: Synaptics clickpad, firmware 8.1
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
uhub2 at uhub1 port 1 "Intel Rate Matching Hub" rev 2.00/0.03 addr 2
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
sd1 at scsibus3 targ 1 lun 0: <OPENBSD, SR CRYPTO, 006> SCSI2 0/direct fixed
sd1: 458984MB, 512 bytes/sector, 939999472 sectors
root on sd1a (75b3938373e8ce78.a) swap on sd1b dump on sd1b
iwm0: hw rev 0x210, fw ver 16.242414.0, address 18:5e:0f:80:80:09
iwm0: could not initiate scan
iwm0: could not initiate scan
iwm0: could not initiate scan
iwm0: fatal firmware error
Stefan Sperling
2016-07-24 13:28:26 UTC
Permalink
Post by Andreas Bartelt
However, the wireless link via iwm(4) is currently almost unusable.
Overall throughput for multiple tcp connections typically between 0 and
1 Mbit/s but mostly on the lower end, i.e., 0.
Looking at the wifi environment you're testing this in is very important.

Does this happen consistently, and everywhere?
Or only at your home, with something like 20 other wifi networks on
the same channel?
Andreas Bartelt
2016-07-24 15:54:21 UTC
Permalink
Post by Stefan Sperling
Post by Andreas Bartelt
However, the wireless link via iwm(4) is currently almost unusable.
Overall throughput for multiple tcp connections typically between 0 and
1 Mbit/s but mostly on the lower end, i.e., 0.
Looking at the wifi environment you're testing this in is very important.
Does this happen consistently, and everywhere?
Or only at your home, with something like 20 other wifi networks on
the same channel?
I've attached some scans regarding WiFi networks at my vicinity. I also
did some measurements for iwn(4) regarding throughput at different
locations. Performance was particularly bad after suspend/resume -- do
you think this might be related?

Best regards
Andreas
WiFi networks from scans very near at my access point (nwid's sanitized) [iwm(4) with observed througput between 1-11 Mbit/s; after suspend/resume, performance was only between 1-5 Mbits/s]:
nwid mywlan chan 8 bssid 74:de:2b:3b:02:65 79% 54M privacy,short_preamble,short_slottime,wpa2
nwid aaaaaa chan 11 bssid 08:95:2a:87:f0:75 25% HT-MCS15 privacy,short_slottime,radio_measurement,wpa2
nwid bbbbbb chan 11 bssid 0a:95:2a:87:f0:77 22% HT-MCS15 privacy,short_slottime,radio_measurement,wpa2,802.1x
nwid cccccc chan 11 bssid 8c:04:ff:b9:29:02 22% HT-MCS15 privacy,short_slottime,radio_measurement,wpa2
nwid dddddd chan 11 bssid 00:03:c9:8b:ee:10 17% 54M privacy,short_slottime,wep
nwid eeeeee chan 2 bssid 00:12:bf:6d:51:3c 16% 54M privacy,short_preamble,short_slottime,wpa2
nwid ffffff chan 4 bssid 90:f6:52:2b:4a:54 14% HT-MCS15 privacy,short_preamble,short_slottime,wpa2
nwid gggggg chan 11 bssid f4:06:8d:84:8a:31 13% HT-MCS7 privacy,short_slottime,wpa2
nwid hhhhhh chan 1 bssid 88:03:55:be:42:2e 11% HT-MCS32 privacy,short_preamble,short_slottime,wpa2
nwid iiiiii chan 1 bssid 2c:59:e5:ef:f3:fa 11% 54M privacy,short_preamble,short_slottime,wpa2
nwid jjjjjj chan 2 bssid 18:83:bf:7d:61:4b 10% HT-MCS32 privacy,short_slottime,wpa2
nwid kkkkkk chan 11 bssid 24:65:11:2b:35:e6 10% HT-MCS15 privacy,short_preamble,short_slottime,wpa2
nwid oooooo chan 1 bssid 84:9c:a6:3a:e2:46 10% HT-MCS15 privacy,short_slottime,wpa2
nwid llllll chan 1 bssid 24:65:11:04:68:62 7% HT-MCS15 privacy,short_preamble,short_slottime,wpa2
nwid mmmmmm chan 6 bssid 7c:4f:b5:97:0f:22 5% HT-MCS32 privacy,short_slottime,wpa2
nwid nnnnnn chan 6 bssid 54:67:51:03:45:df 5% HT-MCS15 privacy,short_slottime,wpa2

From the room where I typically observe the reported througput problems:
[best case I've observed relatively stable througput around 9 Mbits; after suspend/resume, performance was only between 0-1 Mbits/s]
nwid mywlan chan 8 bssid 74:de:2b:3b:02:65 46% 54M privacy,short_preamble,short_slottime,wpa2
nwid aaaaaa chan 11 bssid 08:95:2a:87:f0:75 41% HT-MCS15 privacy,short_slottime,radio_measurement,wpa2
nwid bbbbbb chan 11 bssid 0a:95:2a:87:f0:77 38% HT-MCS15 privacy,short_slottime,radio_measurement,wpa2,802.1x
nwid eeeeee chan 2 bssid 00:12:bf:6d:51:3c 23% 54M privacy,short_preamble,short_slottime,wpa2
Stefan Sperling
2016-07-25 09:59:44 UTC
Permalink
Post by Stefan Sperling
Post by Andreas Bartelt
However, the wireless link via iwm(4) is currently almost unusable.
Overall throughput for multiple tcp connections typically between 0 and
1 Mbit/s but mostly on the lower end, i.e., 0.
Looking at the wifi environment you're testing this in is very important.
Does this happen consistently, and everywhere?
Or only at your home, with something like 20 other wifi networks on
the same channel?
I've attached some scans regarding WiFi networks at my vicinity. I also did
some measurements for iwn(4) regarding throughput at different locations.
Performance was particularly bad after suspend/resume -- do you think this
might be related?
Assuming there is a suspend/resume bug where the HW doesn't get initialized
properly after resume, then yes, that could explain the problem.
I haven't noticed such a problem myself yet, nor gotten any such reports.
Can you gather more evidence somehow?
nwid mywlan chan 8 bssid 74:de:2b:3b:02:65 79% 54M privacy,short_preamble,short_slottime,wpa2
nwid aaaaaa chan 11 bssid 08:95:2a:87:f0:75 25% HT-MCS15 privacy,short_slottime,radio_measurement,wpa2
nwid bbbbbb chan 11 bssid 0a:95:2a:87:f0:77 22% HT-MCS15 privacy,short_slottime,radio_measurement,wpa2,802.1x
nwid cccccc chan 11 bssid 8c:04:ff:b9:29:02 22% HT-MCS15 privacy,short_slottime,radio_measurement,wpa2
nwid dddddd chan 11 bssid 00:03:c9:8b:ee:10 17% 54M privacy,short_slottime,wep
Channels 8 and 11 do have some overlap, see
https://en.wikipedia.org/wiki/File:2.4_GHz_Wi-Fi_channels_%28802.11b,g_WLAN%29.svg
But not enough to qualify as a primary reason for your problem, I guess.
You might also have to take actual load of on these other networks into
account. An iwn(4) device in monitor mode in combination with tcpdump
will show you what's going on in the air (including control frames like
RTS/CTS frames):
ifconfig iwn0 down
ifconfig iwn0 -nwid -bssid -wpakey -nwkey -chan
ifconfig iwn0 mediaopt monitor
ifconfig iwn0 chan 8
ifconfig iwn0 up
tcpdump -n -i iwn0 -y IEEE802_11_RADIO


Earlier in this thread, you mentioned that your AP is running OpenBSD
with the ral(4) driver. The RTS threshold fix I committed last week will
likely affect its behaviour. Did you upgrade your AP to -current?
I'd be very interested to know how this AP behaves after an upgrade.
Loading...