Info about major changes and problems on KTH network (KTHLAN)
150727 During the last weeks we have migrated all connections from router VSS1 to VSS3.
140807 "Kemi" and "Bergs" down 6-8 casued by lightning. Some AP:s down for same reason.
Everything restored.
140801 New supervisor card for VSS1-sw2 (second half), redundancy restored.
Hopefully normal operations resumed.
140801 A short problem after reconnect of a datacentral switch
affected login.kth.se and UG fileservers at around 11.00
140731 Broken central router VSS1, this affected parts of KTH
network. The router is now up, with one card disconnected - a
number of connections moved. Redundancy reduced while we wait
for fix from Cisco. RT#1929551
140409 The broken half of VSS2 has now been restored. Full redundancy now
works We still need to upgrade the system because it has a
BUG. For the moment we are running with a workaround. But are
consulting with the manufacturer about which version to upgrade
safely to.
140407 We have problems with central router VSS2. One half is
broken. All traffic works but redundancy is lost. We had some
short interruptions of network service for parts of KTH network
around 21. We are waiting for support from ATEA/Cisco to
hopefully fix this on wednesday.
130716 Problems with KTH:s borderrouters CN6 and EA6 both got software
problem within an hour. Most traffic was redundant, including
connection to internet but there was a short disturbance of
resolver traffic and the some wireless clients can have
experienced interuptions while their APs changed to a redundant
controller.
120816 Update of status of the problem with overloaded central router(s):
During Wednesday + Thursday the router-pair (VSS)
seem to have handled load ok - so we seem to have a somewhat
better prognosis for survival...
Hopefully our actions were sufficient and the problem was
caused by a combination of the hardware error, software bugs
and traffic. Normally we can identify a single error rather
quickly. The big challenge is two or three combined errors
which seems to be probable in this case.
In addition to the actions yesterday we also have moved
wireless IPv6 traffic off the main router and tunet some other
settings.
IPv6 traffic has increased much during the summer as we have
activated IPv6 on Eduroam. This is combined with the fact that
mayor web-sites like Google, Facebook, Yahoo also have enabled
IPv6 since June 6:th.
120815 Some progress:
We have during Tuesday late evening replaced the failed
10-GigabitEthernet card and upgraded software on both halves of
the redundant router pair that much of KTH network traffic
passes through.
This corrects one big hardware problem and a number of software bugs.
Hopefully this could be some help against the problem.
120814 Again bad problems with an oveloaded main router-sw VSS1 around
lunch. Some parts of the network were hard to reach. One
16-port 10-GigabitEthernet card broke completely and we are
wating for a replacement (4 hours). As long as we are not
overloaded all traffic should still work as all connections are
redundant (i.e. the other half of the router-pair handles all
traffic).
120813 We had bad problems with an oveloaded main router-sw VSS1
around 13.35 to 14.05. Some parts of the network were hard to reach.
120221 KTH border router CN6 had problems 01-05. Traffic was rerouted at 02.xx
Supervisor in CN6 replaced.
120216 10-15 minutes interruption of central kth networks at around
14. Cause STP-error in kista. Better STP filters installed.
110718 We are upgrading all APs to new version 7.0.116.0, this will
cause some downtime in the evenings for wireless users.
101129 We had bad problems with an oveloaded main router-sw VSS1
around 20.00-20.30. Some parts of the network was hard to
reach.
100705 EDUROAM - new certificate installed in KTH's authentication
servers for eduroam. See
http://www.lan.kth.se/eduroam
for more information.
100616 19.53-21.00 Both links to SUNET was down for about 1.15
hour. KTH had no internet. Both SUNET-switches hung at almost
same time, restart worked. NUNOC investigates how this is
possible (SUNETTICKET-732)
1005xx KTHOPEN/EDUROAM - we had a problem with the dhcp-server on
friday afternoon. We have fixed the dhcp-server with better
redundancy.
090806 Net 32 will be upgraded starting 18:00
18:08 starting in DH3
18:29 DH3 up on 1 link in the etherchannel
22:22 starting in DHQ
22:32 done in DHQ
23:08 everything upgraded but one of the redundat links to DH3 fails to light up. We will check more tomorrow.
090716 The central router VSS1 was upgraded to a new version. There
was a few minutes break for som parts of the network.
09071x The whole wireless net upgraded to new version.
090609 Today we will change the loginsystem for KTHOPEN to a new one (you now have to use the same "network secret" as for EDUROAM).
The old system will be available for a while as KTHOPEN-OLD. Info in swedish here
0904xx During the spring may houses has been moved to VSS router.
080915 Today we had problems with kthopen around noon. This was started by downtime for the central login-function "login.kth.se"
080625 We have had problems with the modempools, it should now be fixed. Note that the pools use old equipmet.
080612 This morning SUNET had problems with the network in stockholm.
It seems to be solved now although the reason is still unknown.
See ticket SUNET-2847 (relevant parts included)
Problem Description:
* 20080612 07:51 UTC, [email protected]
There is a major disturbance in SUNETs IP-network.
:
:
Affected:
* 20080612 07:51 UTC, [email protected]
Most of the Stockholm customers circuits are down.
We are trying to establish the full extent of this
disturbance.
:
:
* 20080612 08:21 UTC, [email protected]
The problem should now be solved, but we are still
investigating why this happened.
080611 This evening (around 20:00) we had some major problems with network connectivity to SUNET.
Traffic (multicast) from France overloaded the parts of the SUNET network and caused connectivity problems to
many universities including KTH.
We and the other universities and NUNOC are investigating what happened.
080423 The nameserver res1 was replaced by new hardware and OS.
080421 We had problems with running out of IP-addresses on the wireless system. We now use addresses 5+6 and 250+251 for wireless at campus.
080420 Strange problems with the controllers of the wireless system. Service interrupted.
070528 We had a short disturbance in the sunet connections when PDC connected directly to sunet and sent us invalid routes for the sunet members.
070331 We are now only using Optosunet for external trafic.
070208 We are trying to use new OPTOsunet for external traffic. This might cause problems.
061223 Problems with router kth1 when Optosunet was tested.
060212 The kthopen net 6 was down during morning. Problems with the login server.
051103 Problems with routers EA4 and CN4 caused problems with parts of the net during the night and
early morning. Affected: Modempool, Studsvik-library, ICS (Net 44), Flemingsberg
050915 An ifc card in router CN6 breaks. Reduced redundancy for some departments
and B.NS.KTH.SE down. Also reduced redundancy for resolvers at kth.
UPDATE: 050916 - we have got a replacement card from Ementor and restored operation.
050914 A problem with our resolver (DNS) 130.237.72.200 caused some disturbances at around
16.30-16.40 for many computers.
050830 Quick upgrade of CN5-GW around 23:20 because of some IPV6 bugs.
Upgrade normally takes about 3-4 minutes. This time it failed. Redundancy
reduced, net-6 and stacken disconnected. Time to fix in about 40 minutes.
050829 We have some problems with IPv6 resolving at res2.ns.kth.se. Ipv4 works all right.
We are waiting for an updated kernel to fix ipv6 bugs.
050825 Work with KTH nameserver B.NS.KTH.SE and resolver RES2.NS.KTH.SE could lead to some
problems. Started at around 17:30
050824 Change in modempool logins. As we are moving to a more standardized login-server
a side effect is that you now have to always specify the domain in UPPER CASE after
the username.
Example for kth.se logins: [email protected]
Example for subdomains: [email protected], [email protected].
The username should still be in lower case!
050803 We have som problems with the upgraded Cisco Callmanager and are
trying to fix it.
Some disturbances of cisco ip phones in kista and at s3 today.
050801 Upgrade of routers CN5, EA5 and KE5 with security update. Around 19.00
050727 Upgrade of Cisco Callmanager whole day. All IP-phones in Kista and S3 affected.
050721 Problem with router KE5 disturbs some nets i kista.
050713 Router JA-GW have some problems with an upgrade disturbing the DKV-student-rooms.
060609 Major problem with cisco callmanager disturbs telephony for about 45 minutes.
Calls outside callmanager fails. Time: around 16-16:45
050504 A router is totaly overloaded, disturbing other routers in the backbone.
This affects most of the KTH network. Time: around 10.30-11.00
050429 Problems with "Cisco Call Manager" redundancy causes short breaks in
phone connection from ip phones in kista during afternoon.
050426 A router (EA4) had serious problems causing it to stop forwarding packets.
This disturbed other routers on the net for a short while.
Time around 22-22.30.
050404 16.51 Big break in STOKAB fiber: Haninge, Telge down.
Redundancy to SUNET reduced. UPDATE: stokab fixed the fiber at 23:59.
050323 Heavy duty load tests caused outside dialling with cisco ip phones to
fail temporarily at around 20.00.
050311 Filterbox for network 6 (kthopen) exchanged for new hardware and software.
It is now possible to login with at central admin login page (kth username).
And also from other universities (CWAA)
050218 Border router kth1 hung 21.00-21.30 causing half of kths subnets to be without
external connection.
040805 STOKAB cable from KTH broken by NCC digging. Affected: both kista links +
Wallqvistska. Stokab had to repatch as the damage was to big for repair.
down 11:20 -> 17:50
040514 Upgrade of KTH telephone exhange (MD110) friday at 18.00
KTHs telephone exhange (PBX, "telefonväxel"...) will be upgraded by DOTCOM
AB (who is responsible for the maintenance) starting friday 14/5 at 18.00.
The telephone system can be unreliable during the weekend, but
should hopefully be working Monday morning.
040127 Planned power break in Q building, longer than planned and our UPSs didn't
survive that long. So we had two breaks at 19 and 22.
040126 Trouble with modempool, old connection cut to early. New connections does not
yet support ISDN calls.
040113 21:00 Cisco routers in kista uppgraded. KE4-GW.
040112 Kista fiber-endpoints moved withing KTH campus
0401xx The kth internal MODEM POOL will be uppgraded and moved between
jan 13 and approx jan 20. The modempool might be unavailable or
behave strangely during this time. The reason for the long
timespan is that the supplier of the ISDN-lines cant't give an
exact date and we only have one line to test on. The new
modempool will only be 30 lines (instead of
the current 60). The new pool should be faster connect times
and support more modems though. We will also be removing ISDN
callback as this costs money for us but there are too few users
to justify any billing system.
0311xx A big number of fibers are being reconnected because of
"försvarhögskolan" who is moving in to a number of previous kth
buildings one of which is a central fiber connection point.
030912 The main netlogon router on net 5 has been changed new hardware and
software. Usage should be as before. Please report any problems.
030827 Replacement of hardware and os for the DNS-server KTH.SE. This should be
transparent to users. Any mail still left using kth.se will break though.
030821 Problems with EA2-GW, haninge, telge, arch affected. 13-13.30
030801 Replacement of hardware and os for the DNS-server NS.KTH.SE. This should be
transparent to users.
030718 We will upgrade all kthlan backbone routers today due to the CERT advisory
about denial of service bugs in IOS.
030710 Eterra changes SRP-cards on kth1&kth2. This should not cause any disturbance
030505 NS.KTH.SE (nameserver), exchanged for a new machine.
030203 Power outage on main campus during night; 00:20 - 01:14
All central equipment survived but some small switches still got hurt.
03012x We have problems with a few SQL-slammer disturbing routers for some subnets.
021203 UPDATE: on broken router EA4.
We got a new chassi and supervisor (central cpu-card) from our service
provider yesterday and have upgraded it and installed it today.
(It is very uncommon for both the backplane and cpu to be bad at the same
time)
We are gradually adding back the redundancy for the networks affected.
We are under the impression that all networks have been working during
the failure with two exceptions:
- 1) Outdated config of dhcp forwarding on the backup router caused DHCP
to stop working on two subnets.
- 2) The upgraded CN4 only sent RIP (routing info proto)
version 2. Some older hosts have been listening to RIP v1 and thus
lost their routing.
021203 UPDATE: we got the new chassi+supervisor yesterday afternoon, and
we are working to put in to place again.
021202 UPDATE: Eterra has been here with a supervisor, but the problem
seems to be the chassis, we are expecting new HW during the day.
021129 We have a MAJOR hardware error on backbone router EA4. Alla connections should be
routed trough backup routers. The could be small things that do not work.
021128 18-21 Network is interrupted during normal service window for installation of new hardware
in GigaSUNET routers and KTH routers.
021120 Major power outage in Kista.
021022 We have problems with IP-phones and dialling to foreign countries in the new version IP call
manager. All users needing this function are temporarily moved to the older version server.
[FIXED 021101, problems caused by bad line on MD110]
021010 Disturbances on router CN4. High cpu load and errors logged. Caused by faulty hardware
switching. We have problems with routing to kista.
CN4 restarted and some workaround put in place.
021004 Big problems with router EA4 at around 15.00. This was caused by intense multicast
from ARCH.
021003 Problems with Ip-telephones. Outward dialling to MD110 PRI was down. Logical line
reset.
021001 Problems with router CN4 for about 5 minutes because of error on S3 subnet.
Disturbing E and MATH mainly.
Sept 02 We have problems reaching www.meriam-webster.com (m-w.com). Please use
www.britannica.com (which includes m-w)
020916 Major problems with KTH MD110 telephone exchange. Also disturbing IP
telephony because the LIMs that connect the IP-tele was affected by
the problems.
020725 Gigasunet connection activated after long delay caused by faulty
stockholm SRP-ring.
Spring Tests of new GIGASUNET connections can disturb connectivity during spring.
2002 We have everything connected but the new SRP technology is not stable yet.
Sometimes during nigth we test the new links.
020404 One central nameserver overloaded, causing problems for some users.
0204xx Some network disturbances caused by external computers. Causing internal
timouts in KTH traffic.
0203xx Some routers updated to newer versions. No known disturbance to the net.
020109 Problems with telge connection. (Cause not identified).
010920 Changes of routing within kth. Connection 1 to SUNET moved to new
router. (during thursday service window).
010911 Some problems with KTHs routers during nigth causes disturbances.
010906 Planned break for upgrading routers. One router (ea2) had major problems
after upgrade, disturbing Telge, Indek, Mech. (around 22.00)
0108xx We had some problems with fiber equipment disturbing connections to
subnet 32, affected: web.kth.se
010109 MAJOR power outage in kista city.
0101xx We have problems with GB-link to Kista, fiber has changed somewhat,
The backup works with slower speed at the intermittent errors.
00092x We have problems that the router/sw ea-gw/sw resets itself every 3d
day. This takes about 3 minutes before everything is up again.
[fixed]
000918 Central routing updated causing some planned disturbances 21-24. A
converter causeed trouble towards DSV for about 2 hours.
000826 Stockholm networks (130.237.x below 64) had problems reaching
most of US during the weekend because of bad routing from
sprintlink for 130.237.0.0/18 (ie subnets 0-63).
000726 Problems with a NADA router disturbs KTHs old FDDI ring, affecting
connections to KISTA.
990622 Some central routers upgraded to a testversion due to special demands
in communication with SSVL.
991207 Half of kth had bad outside connectivity because one of sunets
routers stockholm-2 was hung from 0500 to 0800 tuesday
morning.
991204 All main routers uppgraded and a new gigabit router installed
on saturday. On sunday link to telge was down 4 hrs because
stokab reinstalled a new fiber instead of the broken one.
991203 The admin servernet was rebuilt in the ongoing effort to
modernize all nets.
991121 Problems with link to telge. EA-GW reloaded, but it seems to be something
with the fiber.
July 99 We have now ensured that we have extra backup links to sunet by
resurrecting the backup agreement with PDC so we can use our
respective lines to SUNET in case of failures.
June 99 We have had a major break of our sunet-connection because of a the new
connection to sunet combined with vaccations and somewhat
unsatisfactory "human communication" between KTHLAN and
KTHNOC. This has all been settled peacefully now.
June 99 We have been connected the new SUNET-155 for a while now. The old
link is still available. The new connection has introduced some
problems with multicast for KTH.
990502 We have problems with EA2-GW. Alla connected departments on this
router have a backup connection so this *should* not disturb normal
traffic. (990503 this has been FIXED by new spare card from our reseller).
990401 Some routers rebooted and uppgraded to new software to further
990327 the installation of the Gigabit Ethernet backbone. This can cause problems
if the new software has undetected bugs.
990323 CN3 rebooted at 23.xx due to bad multicast routing tables.
(reboot takes about 4 minutes)
990113 Fiber to telge started working, after some initial problems with
the equipment. Fiber distance is 60 km and we run 100 mbits ethernet
on the link.
990112 Problems with memory of CN3-GW required reboot (after 339 days).
Downtime was 2 min 36 secs
981119 We have serious problems with fiber-link to haninge. The vendor
of fiber equipment is investigating. (this is fixed now)
981027 cn2-gw completely removed. addresses 130.237.210.1 130.237.1.1
981007 Fiber connection running 100 Mbps ethernet to Haninge installed.
Distance is 40 km.
9809 During september we have moved many departments to
100 Mbit and new routers. Only one dept network is still
connected to cn2-gw.
980813 USA line down since 09.11 today. Came up at around 18.00
9807 During summer we have moved a lot of interfaces, upgraded to 100
Mbit etc. A new gateway cnbup-gw are going to replace cn and cn2
for backup networks. cnn and ea are the new main routers.
980609 US 155 MBit line was down.
Problem started 980608 21.30, ended 0609 11.00
9805 Various departments moved to 100 Mbit. New fibers are being deployed.
New UPS and cooling in commrooms CN and EA.
980428 21.50 CN and CN2 rebooted due to move to new UPS.
During april net 41, 31, 24, and some others have moved to
100 Mbit.
980311 CN2-GW hangs at 12.30-12.50
980206 All routers in CN (cn, cn2, cn3 and cnn) go down 12.48 due to bad
UPS. Reboot takes 2 to 5 minutes.
980203 All routers in CN (cn, cn2, cn3 and cnn) go down 07.48 due to bad
UPS. Reboot takes 2 to 5 minutes.
980127 Problems with new gw cnn-gw. Net 72 was hard to reach.
980122 Net 72 reconnected to 100 MBps ethernet on CNN-GW.
980116 cn-gw and cn2-gw hangs due to problems with overload. Probably
caused by a "smurf" attack.
971212 ea-gw had problems during the nigth. We still have not
got a spare CPU-board from upnet but we have cannnibalized our
extra router to exchange cpu-board.
971208 We have hw problems with ea-gw. This affects vov, met and tpf.
Down 7 minutes at 12.25.
971201 cn2-gw hangs for around 20 minutes at 15.00.
971111 Net 60 and 63 (VOV) connected to a new router. Some machines
on VOV net may have problems reaching other ip hosts. A clearing of
the arp cache or a reboot fixes this.
971103 CN3 gets a mem parity error and is down for 2 minutes at 9.13
971016 KTHNOC router "sthlm2-gw" went down due to hardware problems.
"tor 16 okt 1997 19.36.50 down 25 minutes, 36 seconds"
This affects our connection to SUNET (and the world).
971007 KTH connection to sunet is reconnected to a new router at KTHNOC
971001 Router cn2-gw hangs at 8.00-8.30
970930 New cisco router module (RSM) for 5500 switch installed on backbone.
We are testing its functioning.
970901 Cisco admits that they have a mayor software problem.
All versions of their 8.2 software is on hold. This puts
a delay on our new backbone switch-routers.
970825 cn3-gw get mysterious problems, rebooting every 5 minutes.
A switch to a older software version corrects this.
970721 Net 16 (MET) connected to cisco 5500.
9707 New cisco 5500 router switch connected for tests.
VOV (net 60) added to this node.
970627 Problems with AC in the CN router room. CN2-GW hangs 05.00-09.00
970522 Cisco router cn-gw hangs 20-21.
970519 Connection to södertelje is down during the morning.
970518 Cisco router cn2-gw hangs 11:12.
970513 Cisco router CN3-GW has problem with ipx traffic (out of memory
after 26 weeks uptime) Fixed around 11.00
970425 Connection to BB (liljanshuset) moved to fiber.
970412 CN2-GW was hung again around 13.00. The remaining connections
will be moved to other routers.
970307 We have problems with the MIDGARD connection. The cisco and the
midgård switch suddenly fails. Resetting the switch helps.
This will be investigated.
970306 A component (fiber converter) had failed for LANTM (net 64). Fixed.
970305 The CN2-GW router hung for a few minutes. This router will be
replaced when we have a fully functional new router.
KTHLAN home page