

A group of CSG-IPPM participants met in Chicago at 9am on Friday, 7-Feb-97. Engineers from eight different CSG universities were present.Attending: Guy Almes Advanced Network & Services almes@advanced.org Tom Barron Univ Minnesota barron@nts.umn.edu John Carlson Univ Washington johnc@cac.washington.edu Bill Cerveny Advanced Network & Services cerveny@advanced.org Bill Jensen Univ Wisconsin bill.jensen@doit.wisc.edu Guy Jones George Washington Univ gjones@gwis2.circ.gwu.edu John Kalbach Penn State Univ kalbach@psu.edu Craig Labovitz Univ Michigan / MERIT labovitz@merit.edu Erikas Aras Napjus Carnegie Mellon Univ erikas@cmu.edu Bill Norton MERIT wbn@merit.edu David Wasley Univ California david.wasley@ucop.edu Matt White Carnegie Mellon Univ mwhite@cmu.edu
Why might this be of use to you? David: From UC's point of view, how could we reasonably write service contracts? Erikas: From CMU's point of view, similar. We'll need to shop for a provider in 1998. We also need to communicate with users who complain of performance problems. JohnK: Also need to understand when remote users pull web stuff from their campus. Tom: There are many repetitive instances of the same problems. They'd like to automate the responses from users. Ignorant people with powerful workstations might leave pings running all the time; this does not scale. The Administration also asks questions about what sites our users are going to; how many users connect to consumer web sites vs univ sites? How can we justify our expensive connecitons? BillN: When Merit oversaw the NSFnet, I built tools that would do pings and ftps from ENSS to ENSS, noting variation, mins, maxes, and packet loss. This was wide-open data. Now, providers are much more closed. Idea: run ongoing traceroutes from univ to univ, and let users/engineers view recent results via the web. Craig: We've been working at routing stability and at ping/loss data using the route servers at the NAPs. Within the next 6 months, we hope to put platforms to measure certain providers, including at the public and private exchange points. BillJ: There is a growing set of users with high expectations and are willing to pay. At Wiscnet we have a broader set of users. JohnC: Sharing of test/measurements machines, software, and databases would be valuable. Also, ping delays are not always accurate predictors of typical IP packet delay. This would help UW with its operations. UW/NWnet has T3 connections to MCI, Sprint, and UUnet; consequently, packet loss at exchange points is less of a problem. GuyJ: Need to break problem down at the cloud-by-cloud level. Congestion at exchange points and at other providers is probably a key source of performance problems.
Common points: delay among the CSG sites packet loss among the CSG sites occasional flow capacity tests among CSG sites all of above at exchanges if possible to diagnose problem at cloud level nature of our campus's usage avoid on-demand user tests allow campus engineers to be proactive openly shared data consistency among all the CSG schools let users understand their problems with our efforts intelligent buying during 1998 for commodity services all of above with QoS in Internet2! all of above with multicast
GuyA then gave the standard sketch of topology with measurement machines at both campuses and at/near exchanges and with a database. Passive tests are *only* done on campuses, with campus engineers in control of and responsible for suppressing inappropriately detailed information. Active tests are done on all measurement machines.
David: The following passive tests were discussed: QoS Type of traffic web smtp ... Horizon of traffic local to my campus local to within my regional/gigapop local to my backbone TLD of source/dest addresses .com .edu .org We might institutionalize getting rid of the low-order byte of the IP address.
In response to GuyA's comment on the use of GPS, Craig indicated that you could achieve the same accuracy using a dedicated analog phone line to a time provider.
There was discussion on the extent of importance of placing measurement machines at the exchange points. Craig pointed out that you may want to avoid placing machines at exchanges and concentrate on end-to-end performance measurements and avoid concentrating on cloud-by-cloud diagnostics.
<after lunch>
David led a discussion of what Passive Tests should be done. - matrices of src ip address without low-order byte dst ip address without low-order byte the above two per-packet or per-bytes type: IP protocol and (if tcp/udp) src and dst port numbers packet sizes stream/flow sizes each of the above as a function of time chunks one k-dimensional matrix or k vectors? time granularity? QoS parameters for each packet (but this would defeat sampling) retransmitted packets (but this would defeat sampling) volume as a function of time - tools that would be adequate Cisco netflow runs of route-switch processor exports data via UDP to another named machine we would write a CSG-IPPM profile for digesting these UDP data Argus could be adapted to generate these same UDP data RMon2 or RMon as an alternative? Argus sniffing as an alternative way of grabbing the data RTFM work, e.g., NeTraMet Statspy/nnstat BTNG is another public domain package BillN: consider using the NetScarf file system conventions
GuyA led a discussion of one-way delay and packet loss. values of lambda of 1 per 5-sec averages (0.2 1/sec) seems OK we'll surely change this with time keep the entire sample using measurement machines at exchange points needed for scalability you might need a higher lambda rate for loss than for delay you should put seq number and previous timestamp in delay packets
GuyA then led a discussion of occasional flow capacity questions to try to answer if I were to attempt a bulk data flow now, what would I be likely to achieve (wind chill factor?) would a given application work *now*? ttcp is commonly used, but this requires very good TCP implementations treno is a good alternative, and it's being worked on in the IETF consider running Bob Carter's bprobe/cprobe tool also consider also running a bunch of packets of various sizes delay should be a linear function of packet size zero intercept is an indication of delay slope is an indication of bandwidth but a full matrix might be needed since the routes might not catenate one idea: do ongoing background tests for heavily used sites and on-demand tests for infrequently used sites this would optimize scaling another idea: do traceroutes periodically (once per hour?) to validate that we're doing the 'right' path-segment background tests
We'll also need to do a set of ongoing traceroutes optimize the case where the route hasn't changed since the previous one BGP-based tools
Platform issues portability will be a major issue consider FreeBSD or NetBSD note that relying on special device drivers can make even BSD unix not be portable unless you also port the device drivers buy a sequential run of machines gnu autoconf is good
Participation we need each participant to consider software resources software development resources analysis expertise security expertise GPS expertise measurement methodology expertise web/cgi/java expertise David and I will be following up on this with each of you.
