Servers

You are currently browsing the archive for the Servers category.

We’ve recently changed the DNS Servers used by ResNet, to some new servers running on some much better hardware.

Most peoples computers have moved seamlessly over to the new DNS servers, but there are less than 20 computers still trying to use the old servers. We’ve done all we can to encourage these computers to start using the new servers but it looks like they’re not configured in our recommended manner so will need manual intervention to move.

The old servers have now been switched off, and when we do so things will break for that small handful of machines. If your ResNet Wired connection stops working, here’s what you need to do.

  1. Reboot your computer. In some cases this will cause it to pick up the new settings automatically
  2. Check your network settings, and make sure you’re set to “obtain DNS server addresses automatically”

If restarting doesn’t work, and your connection is already set to “obtain DNS server addresses automatically” check to see what DNS Servers you’ve been given.

They should start with “137.222.8.x” – if they start with “172.16.67.x” your computer hasn’t picked up the new servers properly. If this is the case, give the IT Service Desk a call, or take your computer along to the Laptop Clinic and we’ll try and help you work out why.

Part of the infrastructure which runs the ResNet back end has developed a hardware fault, some of the services it’s running are currently in a degraded state. We need to move services off it asap.

The following services are affected:

– http://go.resnet.bris.ac.uk – will be unavailable for up to an hour while we move it
– http://my.resnet.bris.ac.uk – will be unavailable for up to an hour while we move it
– DNS (ResNet Wired) will be in a degraded state, and web browsing for 50% of currently connected ResNet customers will be slower than usual
– DHCP (ResNet Wired) will be working, in a non-resilient state
– The ResNet channel in portal.bris.ac.uk will be unavailable for up to an hour while we move it

Various other administrative tools (eg the troubleshooting system used by the Service Desk when troubleshooting ResNet connection problems) will also be affected.

Important: Wireless is not affected at all, so if you’re experiencing wired connection issues – try the wireless.

Updates will follow as we get them.

Update 24th March, 16:15
– http://go.resnet.bris.ac.uk – is now available again
– http://my.resnet.bris.ac.uk – is now available again
– The ResNet channel in portal.bris.ac.uk is now available again
– DNS/DHCP (ResNet Wired) – This is now back in service in a non-resilient manner (both the primary and secondary are on the same hypervisor) which will get us through tonight but isn’t ideal in the long term.

Update 24th March, 16:56
We’ve started to move people onto a new pair of DNS servers to improve resilience and reduce the load on the non-resilient hypervisor. Early indications are that the new servers are performing nicely and that people are moving across to them smoothly as their DHCP lease renews.

Update 25th March, 10:51
The temporary hardware we used to restore service yesterday is overcommitted and while it’s working, it’s at the limit of what it can handle. We will be moving services off it today onto a more permanent home, and we’ll try to minimise downtime while we do so.

Update 26th March, 09:28
Some services are taking substantially longer than expected to migrate. We kicked off the migration of 3 VMs last night, and they’ve still not fully completed. The following services are currently unavailable:
– http://go.resnet.bris.ac.uk – (still migrating)
– http://my.resnet.bris.ac.uk – (still migrating)
– http://www.resnet.bris.ac.uk redirects (the content is still available at http://www.bristol.ac.uk/it-services/advice/homeusers/resnet)

Update 26th March, 11:00
Everything is back up on new hardware.

Update 2014-01-24 10:50am As of about 10:45am the replacement server is up and handling DNS queries so we’re back in business.

Sorry for the disruption!


We had some maintenance scheduled for 9am-10am this morning, which would have resulted in slow DNS performance for a small number of users for a comparatively short amount of time.

Unfortunately we hit some issues, the work over-ran, and then the last stage of the work failed completely rendering a DNS server unbootable.

As a result, ResNet Wired is currently running on only one DNS server.

The server which is missing is the primary for 50% of ResNet, so half of our customers will be seeing slower than usual internet access as it’s taking longer than usual to resolve domain names.

We’re in the process of building a replacement server, and we hope to have normal service restored as soon as possible.

This issue does not affect the wireless network.

There will be a short break in service affecting all ResNet connections on Wednesday 27th June, between 9am and 9:30am while we make some changes to the network infrastructure.  Additionally the ResNet service should be considered at-risk until 10:30am.

The following services will be effected:

  • ResNet Wired – your computer will not be able to connect
  • ResNet Wireless – your computer will not be able to connect
  • My ResNet control panel will not be available
  • The ResNet Activation System will not be available

If you experience a problem with your ResNet connection during that period, please wait half an hour and then restart your computer before trying again.

We’re apologise for the short notice.  Updates on our progress will be added to this blog post when we have them.

Update: 2012-06-27 09:30
As far as we can tell, everything is back up again.  The interruption for the majority of users will only have been a couple of minutes, but if you’re having problems try restarting your computer.  If that doesn’t help, contact the IT Service Desk.

Because of essential maintenance to one of the university database servers, the following services will be unavailable between 21:00 and 21:30, Monday 13th February. This work is to fix the underlying cause of the problems we experienced before Christmas.

For the duration of the work, the following ResNet services will be unavailable:
The helpdesk connection troubleshooting tools will also be unavailable, but as it’s out of Service Desk opening hours that’s not going to make much of a difference!

For more information about other services (including a number of University Websites) which will be affected by this maintenance work, see the IS website: http://www.bris.ac.uk/it-services/news/2012/diversp1feb.html

The database server used by a large number of ResNet systems is currently experiencing problems.  The issues started at around 17:18pm on Weds 14th December.

Until the issues can be resolved, the following ResNet services are unavailable:

  • The My ResNet Control Pannel (http://my.resnet.bris.ac.uk) will be completely unavailable
  • The ResNet Activation System (http://go.resnet.bris.ac.uk) will be completely unavailable
  • All helpdesk troubleshooting tools will be unavailable
  • Various back-end scripts are failing, and may result in strange behaviour from the bandwidth monitoring systems (the ResNet Gadget/RSS feeds won’t be updating) , the network status monitoring, DHCP updates will not happen etc.
Because ResNet don’t run the database server, it’s a little out of our hands at the moment.  The issue has been reported to the database team, and we will update this page as and when we have more information.
Update: 2011-12-15 09:35
The issue seems to be resolved (for now) and our systems seem to be working again.  However, reports from the database team suggest an underlying problem which they are continuing to investigate.

About 25 of the servers which run ResNet are running *really* slowly at the moment (for various complicate and convoluted reasons) this means that customers may notice the following:

  • The ResNet DNS servers are slow and unresonsive, which effectively means that your connection may function slowly, or even not at all.
  • The My ResNet control panel (my.resnet.bris.ac.uk) is slow and unresponsive
  • The ResNet Activation System (go.resnet.bris.ac.uk) is slow and unresponsive
  • Most of  our backend troubleshooting tools are playing up as well, which makes troubleshooting individual problems difficult for the Service Desk
  • The Wireless ResNet authentication servers are caught up in this as well, so ResNet Wireless may also be unusable

Because of the widespread nature of the problem, we’re expecting it to affect all ResNet customers (Wired and Wireless)

We’re aware of the problem, and are doing everything we can to restore service as quickly as possible.  I’ve already started planning ways to stop the same issue hitting us again in the future.

Update: 21/11/2011 17:13
Good news and bad news.  The good news is that the load issues are beginning to resolve themselves and DNS seems to be back to normal.  The bad news is that somewhere in the last 2 hours we hit a (separate, but related) problem which means none of our web servers are currently running (so www.resnet/my.resnet/go.resnet etc are still dead)

We’ve got a lead on that problem though, and are working to get them back up again.

Update: 21/11/2011 17:50
And we’re back in business!   Sorry for the inconvenience everyone.

Because of essential maintenance to one of the university database servers, the following services will be unavailable between 8:30am and 12:30am, Wednesday 2nd March.

The helpdesk connection troubleshooting tools will also be unavailable, so there will be a diminished support capacity for ResNet until about 1pm.

For more information about other services which will be affected by this maintenance work, see the IS website: http://www.bristol.ac.uk/is/news/2011/diversp17feb.html

It seems that one of our servers which provides My ResNet, the in-room registration system  and DNS for half of ResNet fell over on Friday evening.  Some people will have experienced slow connections as their connections failed over to the secondary DNS server.

As the office isn’t staffed at weekends, I only happened to notice it by chance.  I’ve restarted the server and normal service was restored at around 14:45 on Sunday.

Sorry for the inconvenience caused.

Information Services are carrying out some essential systems maintenance on the 11th August 2009, 10:00-10:30 which affects the following ResNet systems:

  • My ResNet
  • ResNet in-room Registration System
  • ResNet Helpdesk troubleshooting tools (including those used by the out of hours helpdesk)
  • ResNet Network Monitoring system (the traffic lights on the ResNet home page)

All fully registered ResNet connections will continue to function normally throughout this maintenance period, so most people won’t notice anything.

There are several other University systems which are also affected by this maintenance, for more details see the IS News website: http://www.bris.ac.uk/is/news/2009/koi28july.html

One of the ResNet servers is feeling a little under the weather, so we’re going to have to take it offline first thing tomorrow
to run some diagnostics to confirm that the problem is what we think it is before we can log a hardware fault report.

The following services will be unavailable from 8:30am-10:30am, Friday 17th April 2009.

Currently registered connections will continue to function normally.

We will update this post when the maintenance is complete

Update: 09:42
Maintenance is now complete, and service has been restored.

Information Services are carrying out some essential systems maintenance on the 16th April 2009, 08:00-12:00 which affects the following ResNet systems:

  • My ResNet
  • ResNet in-room Registration System
  • ResNet Helpdesk troubleshooting tools (including those used by the out of hours helpdesk)
  • ResNet Network Monitoring system (the traffic lights on the ResNet home page)

All fully registered ResNet connections will continue to function normally throughout this maintenance period, so most people won’t notice anything.

There are several other University systems which are also affected by this maintenance, for more details see the IS News website: http://www.bris.ac.uk/is/news/2009/oraclepatch17march.html

In the early hours of Sunday morning, one of our database servers encountered a problem and stopped accepting connections.  This meant that the My ResNet and Registration systems were unavailable, and in some cases were displaying unhelpful error messages.

The database team sprang into action on Monday morning and fixed the problem, and everything is back up and working now.  Sorry for any confusion/concern that this caused!

One of our ResNet DNS servers has developed a disk fault.

The majority of ResNet users will not be effected, as our other DNS server is ticking along nicely – however some users will notice that web usage feels a little unresponsive as their computer tries the dead server before failing over to the live one.

We swapped out the disk this morning at about 10am, in what should have been a transparent procedure with no expected downtime.  Things don’t appear to have gone to plan, and the service is taking longer than expected to restore (as we’ve had to take the server offline while it re-synchronises its disks)

I’ll update this post when we’re back in business.

Update: 12:31
The disks are still syncing – but thanks to some fancy footwork, we’ve managed to move the DNS service over on to a backup machine, so we’ve got two DNS servers again.

Update: Friday 24th Oct – 11:09
Normal service has been resumed – thanks to a significant amount of mucking about with disks, a long and tedious synchronisation process, and an issue with the disk controller drivers.  I do have some more minor work to do to this server to tidy things up – but it’s currently stable so I’ll leave it alone for the weekend.

It looks like the University single-signon system isn’t working at the moment. I’m told that people are looking into it, but as far as I can tell it’s a certificate problem.

This means that the following ResNet services are unavailable as there is no way to log in at present.

  • My ResNet
  • The in room registration system
  • All our administration tools

I’ll update this post when the problem has been resolved.

Update: 2008-08-19 – 16:47
Problem is now resolved.  I don’t have many details as yet, but access to all the above tools has been restored.

There is some central database maintenance going on today, which means the following ResNet services are unavailable:

If your ResNet connection is working, it will continue to work unaffected. If you need to register your connection, move rooms, or connect a new computer to ResNet, you’ll need to wait until later this evening.

Other systems around the university may be affected by this maintenance period. For more information, see http://www.bris.ac.uk/is/news/2008/datahavenaugust4.html

Update: 2008-08-14 10:43
It seems that there were problems encountered during the upgrade which meant it took longer than expected.  Normal service was resumed at about 10am today (Thursday) and after a few minor wobbles we appear to be back in business.

It would seem that just after I left the office on Friday, the server which hosts both the registration process and the My ResNet service fell over. The out of hours helpdesk have probably noticed this, as it also hosts the system which they use to look at the health of peoples ResNet connections when they call up!

We don’t have access to the machine room on weekends, and I’m unable to reboot the server remotely as we’ve upgraded part of the equipment that does that – and my crib sheet of instructions is out of date! The staff that have the required knowledge to update my crib sheet aren’t available…

So it’ll have to wait until first thing Monday morning. I should be in the office around 8:30am on Monday, and I’ll reboot it first thing.

Sorry for any inconvenience caused by this outage.

Update: 2008-06-30
The server was rebooted at about 8am this morning.  Normal service has been resumed.

It looks as though one of our servers fell over at about 7am yesterday morning. We didn’t notice because we were all away for the bank holiday. Sorry about that.

The services affected include:

There are a number of other services which may have experienced problems, although those appear to have fallen over to backup servers.

We have a mechanism for rebooting servers from offsite, so I’m trying for a reboot now.

Update: Reboot complete, service should now be back to normal.

It seems we’ve been having problems with the ResNet Chat service all day.

It’s been doing some odd things, specifically with server2server chat (eg chatting to your mates on googletalk)

The problems became a little worse this evening and I’ve spent the last three quarters of an hour or so trying to breathe some life back in to the service.

I’m not getting very far.

I’ve managed to get it back to a point where you can connect, and it appears that you can chat to other ResNet chat users – but talking to external users still isn’t working.

I’m going to keep looking into it, but I’m rapidly becoming out of my depth – so I’ll be passing it on to a collegue in the morning.

Sorry for any inconvenience this causes!

Update 2008-03-04: We’ve spent all day looking at it, updating certificates, restarting this, fiddling with that – and it’s still broken. We’re going to try a cold reboot of the server tomorrow morning to see if that helps, but as far as we can tell it *should* be working. I will update again tomorrow morning.

One of the Universities central database servers will be unavailable between 8am and 10am on Thursday 6th March 2008.

What does this mean for ResNet? Most people won’t notice, it should have no impact on existing registered connections at all. However, the following services will be unavailable:

  • Registration System. This will be unavailable to new customers. Existing customers should be ok.
  • We don’t think that My ResNet will be affected, but it may be in some cases.
  • We will be unable to sell subscriptions during the maintenance period.

Some other university systems will be unavailable as well. For more information, see http://www.bris.ac.uk/is/news/2008/datahub6mar.html

The database server which runs various chunks of ResNet will be unavailable on saturday morning for about 2 hours due to some essential systems maintenance.

Customers with existing connections will be unaffected – your connection will continue to function as normal

The maintenance will affect the following parts of ResNet

  • The in-room registration system for new customers will be unavailable
  • The My ResNet system will be unavailable
  • The slink url shortening service will be unavailable
  • All our back-end administration tools will be unavailable, but you’re unlikely to notice that anyway 🙂

Various other university services will be unavailable during the maintenance period, see the following IS News article for more details.

http://www.bris.ac.uk/is/news/2008/maint9feb.html

One of the Universities central database servers will be unavailable between 8am and 9am on Thursday 24th Jan for some essential systems maintenance.

This will mean that the following ResNet services will be unavailable:

  • The in-room registration service for new users
  • Some parts of My ResNet may also be unavailable

All existing ResNet connections should continue to work uninterrupted and most customers won’t notice anything.
There are a handful of other University services which will also be unavailable during the maintenance period. For more details, please see the IS News website: http://www.bris.ac.uk/is/news/2008/datahub24jan.html

There seems to be something funny going on with our blog templates, I’m investigating – so the styling of this page may change without warning as I try to debug the problem…

Update: All done, everything should be back to normal.

One of the Universities central database servers will be unavailable on Saturday 27th Oct for some essential systems maintenance.  This will mean that the following ResNet services will be unavailable:

All existing ResNet connections should continue to work uninterrupted and most customers won’t notice anything.

There are a number of other University services which will also be unavailable during the maintenance period.  For more details, please see the IS News website: http://www.bris.ac.uk/is/news/2007/maint27oct.html

It’s one of *those* Fridays.

It seems we’ve been having problems inserting data into the main ResNet database.  I’m not sure how long it’s been a problem, but it looks like it may have been a couple of days now.  I’ve cleared the problem now – if you were having problems registering, please try again.

If your connection is fully working, there is no need to take any action.

Most people on ResNet won’t have noticed, but one of our servers dropped off the network at about 7pm last night due to a problem with one of it’s network interfaces. Things that will have been affected include:

  • Registration System
    The server that fell over provides the DNS for the registration network. It’s set up to redirect all requests for web pages to our registration system. Your connection will have fallen over to the secondary DNS, but the secondary isn’t set up in the same way – so your web page requests won’t have been diverted.If you typed in the url for the registration system directly, you will have been able to register OK.
  • Manage My Resnet
    Bandwidth reports for last night won’t have been updated.
  • Network Status Monitoring
    The traffic lights on the ResNet home page which monitor the status of the network won’t have been updated.

Normal service was resumed at 9am this morning which was the first available opportunity for me to have physical access to the server.

One of the central University database servers is being restarted at 8m on Friday 27th July 2007. This is expected to take about half an hour.

This maintenance will cause problems with the following parts of ResNet;

  • Registration system
    New users will not be able to register for the duration of the maintenance period.
  • Manage My ResNet
    Some parts of Manage My ResNet will be unavailable for the duration of the maintenance period.

Existing ResNet connections will remain active, and most users won’t even notice. For more details about other University services which will be affected by this maintenance – see http://www.bris.ac.uk/is/news/2007/stellar27jul.html

I’ve just noticed that one of our servers fell over at 7am this morning. I’m not in Bristol so I can’t go in and switch it back on, and unless one of the other permanent staff happens to be in the area – the following services will be unavailable until Monday morning.

  • In room registration process. New users will not be able to register their connections.
  • Manage My ResNet – You won’t be able to get to this at all, so you won’t be able to see your bandwidth usage (although we’re still counting it!)
  • ResNet software archive (http://download.resnet.bris.ac.uk/pub) will be unavailable
  • The short version of the URL for the ResNet website (http://www.resnet.bristol.ac.uk) won’t work, although the long form will (http://www.bris.ac.uk/is/computing/advice/homeusers/resnet/)
  • There may be a handful of other services that don’t work as expected, which I’ve forgotten. But it’s Saturday morning and I haven’t had breakfast yet.

All existing ResNet connections should continue to function as normal.

Update: I have just kicked the box back into life – it is back up and all services are running as expected – sorry for any inconvenience caused. Mark.

We seem to be having some problems with Manage my ResNet – which may mean that people are unable to use it to check their bandwidth usage and/or sign up for the IPTV trial. I am investigating.

Update: The problem seems to be with our database link between the ResNet database and the main university database. Independently the two databases are working on their own, they are just not talking to each other. I’m just off downstairs to make the database admin team aware of the problem.

Update: Just as I hit save on that last update, the problem seems to have fixed itself. There should be no problems logging in to Manage my ResNet as of about 12:13pm today.

Update – 10:02am
It’s all fixed.  Sorry for any inconvenience caused.

It seems that we’re having database problems at the moment, We can’t look up anyones details in the central university database.  From digging in the logs, it looks like the problem started at about 01:15 this morning. The following systems don’t work:

  • Manage My ResNet
  • ResNet in-room registration system

If you try to log in to either of these systems, you’re likely to get the following error message “Sorry, we couldn’t find anyone in the University database with that username”

We’re working with the database administration team to try and work out what the problem is and to fix it asap.

The database server we use is due to be taken down for some routine maintenance on the 2nd January 2007. The vast majority of ResNet users won’t notice, although the following services will be unavailable during the maintenance.

The Registration System – people trying to set up ResNet for the first time, or with a new computer will not be able to register.

Manage My ResNet – you won’t be able to check your bandwidth usage, your account details, or move rooms.

ResNet status monitoring – you won’t be able to see if there are network problems on ResNet

That’s pretty much it.

Other systems around the University may be affected, please see http://www.bris.ac.uk/is/news/2006/sys2jan.html for more details.

For a period on 18th December between 8am to 10am the following ResNet services will be unavailable

  • ResNet online registrations
  • Manage My ResNet
  • ResNet automatic status monitoring

This is due to an urgent need to move power supplies in several racks in the machine room – the server our database is hosted on is one of those systems affected.
Actual ResNet connections will be unaffected – you can keep on using ResNet.

The ResNet database has been switched off. This is so that the machine that it runs on can be attached to a new super quick network filesystem. Sorry we didn’t tell you earlier.

Luckily, this work will not affect most ResNet connections:

  • If you are a current ResNet user, the Manage my ResNet utility and the automatic status monitoring on our homepage will be unavailable for most of the day. Otherwise, your internet connection should be unaffected.
  • We will not be able to move your connection if you are moving room/hall until the database is back on.
  • If you are a new ResNet user and need to pay us, we will be able to take your money, but we will not be able to switch on your account until after the database has been reactivated. Usually this would be instant.

Update: this was restored by about 11.45am this morning.

On Sunday 12th November @ 22:30 one of the 4 proxy servers that ResNet users use to surf web sites crashed because its log file became too full. The initial question should be “how did we let the log file get so big?”. Well, the log file is able to get to 2GB in size before it fails, which under normal circumstances is more than enough, especially as we rotate the logs daily. However, one user’s machine was making about 100 requests per second (over 8 million per day) to this server which caused it to crash. The user in question has been disconnected from ResNet until we can find out what software was causing the problem.

Most of you probably did not even notice the server fail as all its traffic was automatically moved onto one of the other three proxy servers with the failed server back up by 9am on Monday. One good thing to come out of this is that we have now changed the way we load balance our proxy servers in the event of an error. Instead of one proxy taking the load of the failed one, doubling its load, the traffic is spread evenly between all remaining servers so only increasing load to each by one third.

Another result of the 8 million connection attempts in one day was that the log analysis server crashed a day later because it ran out of disk space due to the larger logs that were copied to it 😉 It never rains but it pours!! This is being fixed by adding a larger disk. Well the original disk was only 18GB, it will soon be a whopping 36GB!

We now have four webcaches for ResNet users, including two brand new systems. We now have more caches than before, and they are running on faster boxes. Also they are split across two separate locations, so should be available even if there is a problem at our main site. Huge thanks to Martin, Squid guru of the PC Systems team for setting up the caches for us.

What do the webcaches do? When you request a webpage your web browser checks to see if the webcache has a copy, and if so gets it from a fast server here at Bristol. If it doesn’t it stores a copy on the webcache for future use. There is 600GB of storage available across the four caches.

By using the webcaches we can also reserve network capacity (bandwidth) for web browsing. Along with email web browsing is the most important and popular facility on the Internet. Other bandwidth on ResNet is shared between gaming, file transfers, radio, instant messaging, etc etc, and is limited both per person and for the whole of ResNet. With dedicated email servers and webcaches for the University we can we hope to make sure you can always browse the web and send mail, even if you or your neighbours are hammering the network with huge transfers at the same time.

Tomorrow (19th September 2006) we will be moving the ResNet database to a much more powerful computer.
Cool … but why do I care? Well, that depends on who you are:

  • If you are a current ResNet user, the Manage my ResNet utility and the automatic status monitoring on our homepage will be unavailable for most of the day. Otherwise, your internet connection should be unaffected.
  • If you are a new ResNet user and need to pay us, we will be able to take your money, but we will not be able to switch on your account until after the database has been moved. Usually this would be instant.
  • If you are coming to Bristol for the new academic year, you will be able to register and setup your connection even faster. (At the start of the year, when 5000 people try to register their connection at the same time, the old database computer tended to get a bit stressed, hopefully the new one will not!).

1pm 19th Sept: the database upgrade has now been completed. The new system is much snappier than the old one, and should be good for the start of term. Thanks to the database guys for sorting this for us.