Info about major changes and problems on KTH network (KTHLAN)

 
150727 During the last weeks we have migrated all connections from router VSS1 to VSS3.

140807 "Kemi" and "Bergs" down 6-8 casued by lightning. Some AP:s down for same reason.
       Everything restored.

140801 New supervisor card for VSS1-sw2 (second half), redundancy restored. 
       Hopefully normal operations resumed.

140801 A short problem after reconnect of a datacentral switch
       affected login.kth.se and UG fileservers at around 11.00

140731 Broken central router VSS1, this affected parts of KTH
       network. The router is now up, with one card disconnected - a
       number of connections moved. Redundancy reduced while we wait
       for fix from Cisco. RT#1929551 

140409 The broken half of VSS2 has now been restored. Full redundancy now
       works We still need to upgrade the system because it has a
       BUG. For the moment we are running with a workaround. But are
       consulting with the manufacturer about which version to upgrade
       safely to.

140407 We have problems with central router VSS2. One half is
       broken. All traffic works but redundancy is lost. We had some
       short interruptions of network service for parts of KTH network
       around 21. We are waiting for support from ATEA/Cisco to
       hopefully fix this on wednesday.

130716 Problems with KTH:s borderrouters CN6 and EA6 both got software
       problem within an hour.  Most traffic was redundant, including
       connection to internet but there was a short disturbance of
       resolver traffic and the some wireless clients can have
       experienced interuptions while their APs changed to a redundant
       controller.

120816 Update of status of the problem with overloaded central router(s): 
       During Wednesday + Thursday the router-pair (VSS)
       seem to have handled load ok - so we seem to have a somewhat
       better prognosis for survival...

       Hopefully our actions were sufficient and the problem was
       caused by a combination of the hardware error, software bugs
       and traffic. Normally we can identify a single error rather
       quickly. The big challenge is two or three combined errors
       which seems to be probable in this case.

       In addition to the actions yesterday we also have moved
       wireless IPv6 traffic off the main router and tunet some other
       settings.

       IPv6 traffic has increased much during the summer as we have
       activated IPv6 on Eduroam. This is combined with the fact that
       mayor web-sites like Google, Facebook, Yahoo also have enabled
       IPv6 since June 6:th.

120815 Some progress:

       We have during Tuesday late evening replaced the failed
       10-GigabitEthernet card and upgraded software on both halves of
       the redundant router pair that much of KTH network traffic
       passes through.

       This corrects one big hardware problem and a number of software bugs.

       Hopefully this could be some help against the problem.

120814 Again bad problems with an oveloaded main router-sw VSS1 around
       lunch. Some parts of the network were hard to reach. One
       16-port 10-GigabitEthernet card broke completely and we are
       wating for a replacement (4 hours). As long as we are not
       overloaded all traffic should still work as all connections are
       redundant (i.e. the other half of the router-pair handles all
       traffic).

120813 We had bad problems with an oveloaded main router-sw VSS1
       around 13.35 to 14.05. Some parts of the network were hard to reach.

120221 KTH border router CN6 had problems 01-05. Traffic was rerouted at 02.xx 
       Supervisor in CN6 replaced.

120216 10-15 minutes interruption of central kth networks at around
       14. Cause STP-error in kista. Better STP filters installed.

110718 We are upgrading all APs to new version 7.0.116.0, this will
       cause some downtime in the evenings for wireless users.

101129 We had bad problems with an oveloaded main router-sw VSS1
       around 20.00-20.30. Some parts of the network was hard to
       reach.

100705 EDUROAM - new certificate installed in KTH's authentication
       servers for eduroam. See 
       http://www.lan.kth.se/eduroam
       for more information.

100616 19.53-21.00 Both links to SUNET was down for about 1.15
       hour. KTH had no internet.  Both SUNET-switches hung at almost
       same time, restart worked. NUNOC investigates how this is
       possible (SUNETTICKET-732)

1005xx KTHOPEN/EDUROAM - we had a problem with the dhcp-server on
       friday afternoon. We have fixed the dhcp-server with better
       redundancy.

090806 Net 32 will be upgraded starting 18:00
	18:08 starting in DH3
	18:29 DH3 up on 1 link in the etherchannel
	22:22 starting in DHQ
	22:32 done in DHQ
	23:08 everything upgraded but one of the redundat links to DH3 fails to light up. We will check more tomorrow.
	
090716 The central router VSS1 was upgraded to a new version. There
       was a few minutes break for som parts of the network.

09071x The whole wireless net upgraded to new version.

090609 Today we will change the loginsystem for KTHOPEN to a new one (you now have to use the same "network secret" as for EDUROAM).
       The old system will be available for a while as KTHOPEN-OLD. Info in swedish here

0904xx During the spring may houses has been moved to VSS router.

080915 Today we had problems with kthopen around noon. This was started by downtime for the central login-function "login.kth.se"

080625 We have had problems with the modempools, it should now be fixed. Note that the pools use old equipmet.

080612 This morning SUNET had problems with the network in stockholm. 
       It seems to be solved now although the reason is still unknown.

       See ticket SUNET-2847 (relevant parts included)

       Problem Description:
         * 20080612 07:51 UTC, [email protected]
         There is a major disturbance in SUNETs IP-network.
	 :
	 :
       Affected:
	 * 20080612 07:51 UTC, [email protected]
         Most of the Stockholm customers circuits are down.
         We are trying to establish the full extent of this
         disturbance.
	 :
	 :
         * 20080612 08:21 UTC, [email protected]
         The problem should now be solved, but we are still
         investigating why this happened.

080611 This evening (around 20:00)  we had some major problems with network connectivity to SUNET.

       Traffic (multicast) from France overloaded the parts of the SUNET network and caused connectivity problems to 
       many universities including KTH.

       We and the other universities and NUNOC are investigating what happened.


080423 The nameserver res1 was replaced by new hardware and OS. 

080421  We had problems with running out of IP-addresses on the wireless system. We now use addresses 5+6 and 250+251 for wireless at campus.

080420	Strange problems with the controllers of the wireless system. Service interrupted.

070528  We had a short disturbance in the sunet connections when PDC connected directly to sunet and sent us invalid routes for the sunet members.

070331  We are now only using Optosunet for external trafic.

070208  We are trying to use new OPTOsunet for external traffic. This might cause problems.

061223	Problems with router kth1 when Optosunet was tested.

060212  The kthopen net 6 was down during morning. Problems with the login server.

051103  Problems with routers EA4 and CN4 caused problems with parts of the net during the night and 
        early morning. Affected: Modempool, Studsvik-library, ICS (Net 44), Flemingsberg

050915  An ifc card in router CN6 breaks. Reduced redundancy for some departments 
        and B.NS.KTH.SE down. Also reduced redundancy for resolvers at kth.
        UPDATE: 050916 - we have got a replacement card from Ementor and restored operation.

050914  A problem with our resolver (DNS) 130.237.72.200 caused some disturbances at around 
        16.30-16.40 for many computers.

050830  Quick upgrade of CN5-GW around 23:20 because of some IPV6 bugs.
        Upgrade normally takes about 3-4 minutes. This time it failed. Redundancy 
	reduced, net-6 and stacken disconnected. Time to fix in about 40 minutes.

050829  We have some problems with IPv6 resolving at res2.ns.kth.se. Ipv4 works all right.
        We are waiting for an updated kernel to fix ipv6 bugs.

050825  Work with KTH nameserver B.NS.KTH.SE and resolver RES2.NS.KTH.SE could lead to some
        problems. Started at around 17:30

050824  Change in modempool logins. As we are moving to a more standardized login-server 
        a side effect is that you now have to always specify the domain in UPPER CASE after 
	the username. 
	Example for kth.se logins: [email protected] 
	Example for subdomains: [email protected], [email protected]. 
	The username should still be in lower case!

050803  We have som problems with the upgraded Cisco Callmanager and are
        trying to fix it.
        Some disturbances of cisco ip phones in kista and at s3 today.

050801  Upgrade of routers CN5, EA5 and KE5 with security update. Around 19.00

050727  Upgrade of Cisco Callmanager whole day. All IP-phones in Kista and S3 affected.

050721  Problem with router KE5 disturbs some nets i kista.

050713	Router JA-GW have some problems with an upgrade disturbing the DKV-student-rooms.

060609  Major problem with cisco callmanager disturbs telephony for about 45 minutes.
        Calls outside callmanager fails. Time: around 16-16:45

050504  A router is totaly overloaded, disturbing other routers in the backbone. 
        This affects most of the KTH network. Time: around 10.30-11.00

050429  Problems with "Cisco Call Manager" redundancy causes short breaks in 
        phone connection from ip phones in kista during afternoon.

050426  A router (EA4) had serious problems causing it to stop forwarding packets.
        This disturbed other routers on the net for a short while.
        Time around 22-22.30.

050404  16.51 Big break in STOKAB fiber: Haninge, Telge down. 
        Redundancy to SUNET reduced. UPDATE: stokab fixed the fiber at 23:59. 

050323  Heavy duty load tests caused outside dialling with cisco ip phones to
        fail temporarily at around 20.00.

050311	Filterbox for network 6 (kthopen) exchanged for new hardware and software.
        It is now possible to login with at central admin login page (kth username).
	And also from other universities (CWAA)	

050218  Border router kth1 hung 21.00-21.30 causing half of kths subnets to be without 
        external connection.

040805  STOKAB cable from KTH broken by NCC digging. Affected: both kista links + 
        Wallqvistska. Stokab had to repatch as the damage was to big for repair.
	down 11:20 -> 17:50

040514  Upgrade of KTH telephone exhange (MD110) friday at 18.00
        KTHs telephone exhange (PBX, "telefonväxel"...) will be upgraded by DOTCOM 
        AB (who is responsible for the maintenance) starting friday 14/5 at 18.00. 

        The telephone system can be unreliable during the weekend, but 
        should hopefully be working Monday morning.

040127  Planned power break in Q building, longer than planned and our UPSs didn't 
        survive that long. So we had two breaks at 19 and 22.

040126  Trouble with modempool, old connection cut to early. New connections does not 
        yet support ISDN calls.

040113  21:00 Cisco routers in kista uppgraded. KE4-GW.

040112	Kista fiber-endpoints moved withing KTH campus

0401xx  The kth internal MODEM POOL will be uppgraded and moved between 
        jan 13 and approx jan 20. The modempool might be unavailable or 
	behave strangely during this time. The reason for the long 
	timespan is that the supplier of the ISDN-lines cant't give an 
	exact date and we only have one line to test on. The new 
        modempool will only be 30 lines (instead of 
	the current 60). The new pool should be faster connect times
        and support more modems though. We will also be removing ISDN 
	callback as this costs money for us but there are too few users 
	to justify any billing system.

0311xx  A big number of fibers are being reconnected because of 
        "försvarhögskolan" who is moving in to a number of previous kth
        buildings one of which is a central fiber connection point.

030912  The main netlogon router on net 5 has been changed new hardware and 
        software. Usage should be as before. Please report any problems.

030827  Replacement of hardware and os for the DNS-server KTH.SE. This should be 
        transparent to users. Any mail still left using kth.se will break though.

030821  Problems with EA2-GW, haninge, telge, arch affected. 13-13.30

030801  Replacement of hardware and os for the DNS-server NS.KTH.SE. This should be 
        transparent to users. 

030718  We will upgrade all kthlan backbone routers today due to the CERT advisory
        about denial of service bugs in IOS.

030710  Eterra changes SRP-cards on kth1&kth2. This should not cause any disturbance

030505  NS.KTH.SE (nameserver), exchanged for a new machine.

030203  Power outage on main campus during night; 00:20 - 01:14 
        All central equipment survived but some small switches still got hurt.
	
03012x  We have problems with a few SQL-slammer disturbing routers for some subnets.

021203  UPDATE: on broken router EA4.
	We got a new chassi and supervisor (central cpu-card) from our service
	provider yesterday and have upgraded it and installed it today.
	(It is very uncommon for both the backplane and cpu to be bad at the same 
	time)
	We are gradually adding back the redundancy for the networks affected.

	We are under the impression that all networks have been working during 
	the failure with two exceptions:

	- 1) Outdated config of dhcp forwarding on the backup router caused DHCP
	to stop working on two subnets.
	- 2) The upgraded CN4 only sent RIP (routing info proto) 
	version 2. Some older hosts have been listening to RIP v1 and thus 
	lost their routing. 

021203  UPDATE: we got the new chassi+supervisor yesterday afternoon, and 
        we are working to put in to place again.

021202  UPDATE: Eterra has been here with a supervisor, but the problem 
        seems to be the chassis, we are expecting new HW during the day.

021129  We have a MAJOR hardware error on backbone router EA4. Alla connections should be 
        routed trough backup routers. The could be small things that do not work.
 
021128  18-21 Network is interrupted during normal service window for installation of new hardware
        in GigaSUNET routers and KTH routers.
	
021120  Major power outage in Kista.

021022  We have problems with IP-phones and dialling to foreign countries in the new version IP call
        manager. All users needing this function are temporarily moved to the older version server.
	[FIXED 021101, problems caused by bad line on MD110]

021010  Disturbances on router CN4. High cpu load and errors logged. Caused by faulty hardware 
        switching. We have problems with routing to kista.  
	CN4 restarted and some workaround put in place.

021004  Big problems with router EA4 at around 15.00. This was caused by intense multicast 
        from ARCH.

021003	Problems with Ip-telephones. Outward dialling to MD110 PRI was down. Logical line
        reset. 

021001	Problems with router CN4 for about 5 minutes because of error on S3 subnet. 
        Disturbing E and MATH mainly.

Sept 02 We have problems reaching www.meriam-webster.com (m-w.com). Please use
        www.britannica.com (which includes m-w)

020916  Major problems with KTH MD110 telephone exchange. Also disturbing IP
        telephony because the LIMs that connect the IP-tele was affected by
	the problems.

020725  Gigasunet connection activated after long delay caused by faulty 
        stockholm SRP-ring.

Spring  Tests of new GIGASUNET connections can disturb connectivity during spring.
2002    We have everything connected but the new SRP technology is not stable yet. 
	Sometimes during nigth we test the new links.

020404 One central nameserver overloaded, causing problems for some users.

0204xx  Some network disturbances caused by external computers. Causing internal 
        timouts in KTH traffic.

0203xx	Some routers updated to newer versions. No known disturbance to the net.

020109  Problems with telge connection. (Cause not identified).

010920  Changes of routing within kth. Connection 1 to SUNET moved to new
        router. (during thursday service window).

010911  Some problems with KTHs routers during nigth causes disturbances. 

010906  Planned break for upgrading routers. One router (ea2) had major problems 
        after upgrade, disturbing Telge, Indek, Mech. (around 22.00)

0108xx	We had some problems with fiber equipment disturbing connections to 
        subnet 32, affected: web.kth.se

010109  MAJOR power outage in kista city. 

0101xx  We have problems with GB-link to Kista, fiber has changed somewhat, 
        The backup works with slower speed at the intermittent errors.

00092x  We have problems that the router/sw ea-gw/sw resets itself every 3d 
        day. This takes about 3 minutes before everything is up again.
	[fixed]

000918  Central routing updated causing some planned disturbances 21-24. A 
        converter causeed trouble towards DSV for about 2 hours.

000826  Stockholm networks (130.237.x below 64) had problems reaching
        most of US during the weekend because of bad routing from
        sprintlink for 130.237.0.0/18 (ie subnets 0-63).

000726	Problems with a NADA router disturbs KTHs old FDDI ring, affecting 
        connections to KISTA.

990622	Some central routers upgraded to a testversion due to special demands 
        in communication with SSVL.

991207  Half of kth had bad outside connectivity because one of sunets
	routers stockholm-2 was hung from 0500 to 0800 tuesday
	morning.

991204	All main routers uppgraded and a new gigabit router installed
	on saturday. On sunday link to telge was down 4 hrs because
	stokab reinstalled a new fiber instead of the broken one.

991203	The admin servernet was rebuilt in the ongoing effort to
	modernize all nets.

991121	Problems with link to telge. EA-GW reloaded, but it seems to be something 
	with the fiber.

July 99	We have now ensured that we have extra backup links to sunet by
        resurrecting the backup agreement with PDC so we can use our
	respective lines to SUNET in case of failures.

June 99 We have had a major break of our sunet-connection because of a the new 
	connection to sunet combined with vaccations and somewhat
	unsatisfactory "human communication" between KTHLAN and
	KTHNOC. This has all been settled peacefully now.

June 99	We have been connected the new SUNET-155 for a while now. The old 
	link is still available. The new connection has introduced some 
	problems with multicast for KTH.

990502	We have problems with EA2-GW. Alla connected departments on this 
	router have a backup connection so this *should* not disturb normal
	traffic. (990503 this has been FIXED by new spare card from our reseller).

990401	Some routers rebooted and uppgraded to new software to further 
990327	the installation of the Gigabit Ethernet backbone. This can cause problems 
	if the new software has undetected bugs.

990323	CN3 rebooted at 23.xx due to bad multicast routing tables. 
	(reboot takes about 4 minutes)

990113	Fiber to telge started working, after some initial problems with
	the equipment. Fiber distance is 60 km and we run 100 mbits ethernet 
	on the link.

990112	Problems with memory of CN3-GW required reboot (after 339 days). 
	Downtime was 2 min 36 secs

981119	We have serious problems with fiber-link to haninge. The vendor
	of fiber equipment is investigating. (this is fixed now)

981027	cn2-gw completely removed. addresses 130.237.210.1 130.237.1.1

981007	Fiber connection running 100 Mbps ethernet to Haninge installed. 
	Distance is 40 km.

9809	During september we have moved many departments to 
	100 Mbit and new routers. Only one dept network is still
	connected to cn2-gw.

980813	USA line down since 09.11 today. Came up at around 18.00

9807	During summer we have moved a lot of interfaces, upgraded to 100
	Mbit etc. A new gateway cnbup-gw are going to replace cn and cn2
	for backup networks. cnn and ea are the new main routers.

980609  US 155 MBit line was down. 
	Problem started 980608 21.30, ended 0609 11.00

9805	Various departments moved to 100 Mbit. New fibers are being deployed.
	New UPS and cooling in commrooms CN and EA.

980428	21.50 CN and CN2 rebooted due to move to new UPS.
	During april net 41, 31, 24, and some others have moved to
	100 Mbit.

980311  CN2-GW hangs at 12.30-12.50

980206	All routers in CN (cn, cn2, cn3 and cnn) go down 12.48 due to bad 
        UPS. Reboot takes 2 to 5 minutes.

980203	All routers in CN (cn, cn2, cn3 and cnn) go down 07.48 due to bad 
        UPS.  Reboot takes 2 to 5 minutes. 

980127	Problems with new gw cnn-gw. Net 72 was hard to reach.

980122	Net 72 reconnected to 100 MBps ethernet on CNN-GW.

980116	cn-gw and cn2-gw hangs due to problems with overload. Probably 
	caused by a "smurf" attack. 

971212	ea-gw had problems during the nigth. We still have not 
	got a spare CPU-board from upnet but we have cannnibalized our
	extra router to exchange cpu-board.

971208  We have hw problems with ea-gw. This affects vov, met and tpf.
	Down 7 minutes at 12.25.

971201	cn2-gw hangs for around 20 minutes at 15.00. 

971111	Net 60 and 63 (VOV) connected to a new router. Some machines
	on VOV net may have problems reaching other ip hosts. A clearing of 
	the arp cache or a reboot fixes this.

971103	CN3 gets a mem parity error and is down for 2 minutes at 9.13

971016	KTHNOC router "sthlm2-gw" went down due to hardware problems.
	"tor 16 okt 1997 19.36.50 down 25 minutes, 36 seconds"
	This affects our connection to SUNET (and the world).

971007	KTH connection to sunet is reconnected to a new router at KTHNOC 
	
971001	Router cn2-gw hangs at 8.00-8.30

970930  New cisco router module (RSM) for 5500 switch installed on backbone.
 	We are testing its functioning.

970901	Cisco admits that they have a mayor software problem.
	All versions of their 8.2 software is on hold. This puts
	a delay on our new backbone switch-routers.

970825	cn3-gw get mysterious problems, rebooting every 5 minutes. 
	A switch to a older software version corrects this. 

970721	Net 16 (MET) connected to cisco 5500. 

9707	New cisco 5500 router switch connected for tests. 
	VOV (net 60) added to this node.

970627	Problems with AC in the CN router room. CN2-GW hangs 05.00-09.00

970522	Cisco router cn-gw hangs 20-21.

970519  Connection to södertelje is down during the morning.

970518	Cisco router cn2-gw hangs 11:12.

970513	Cisco router CN3-GW has problem with ipx traffic (out of memory 
	after 26 weeks uptime) Fixed around 11.00

970425	Connection to BB (liljanshuset) moved to fiber.

970412	CN2-GW was hung again around 13.00. The remaining connections
	will be moved to other routers.

970307	We have problems with the MIDGARD connection. The cisco and the 
	midgård switch suddenly fails. Resetting the switch helps.
	This will be investigated.

970306	A component (fiber converter) had failed for LANTM (net 64). Fixed.

970305	The CN2-GW router hung for a few minutes. This router will be
	replaced when we have a fully functional new router.

KTHLAN home page