OLSR-NG: Unterschied zwischen den Versionen

Aus FunkFeuer Wiki
Wechseln zu: Navigation, Suche
(Next Steps)
(Update working packages)
Zeile 212: Zeile 212:
 
|-
 
|-
 
| Hannes Gredler
 
| Hannes Gredler
| SPF improvments
+
| SPF improvements
 
| DONE
 
| DONE
 
|-
 
|-
 
| Hannes Gredler
 
| Hannes Gredler
 
| reduce malloc thrashing during SPF computation
 
| reduce malloc thrashing during SPF computation
| not yet started
+
| WIP
 
|-
 
|-
 
| Hannes Gredler
 
| Hannes Gredler
| improve post-SPF handling (route table conciliation)
+
| improve post-SPF handling (route table conciliation, best path selection)
 
| DONE
 
| DONE
 +
|-
 +
|Bernd Petrovitsch
 +
| rework the logging system
 +
| WIP
 +
|-|-
 +
|Bernd Petrovitsch
 +
| rework the TC-LQ input parsing, avoiding malloc thrashing
 +
| WIP
 
|-
 
|-
 
| Hannes Gredler
 
| Hannes Gredler

Version vom 23. August 2007, 13:49 Uhr

<google>OLSR</google>


NEWS

We have released our first set of improvements to the olsrd SPF calculation module.


SPF implementation

When executing the SPF calculation upon every iteration the least cost path needs to be extracted and put on the result list. For that purpose olsrd-current does keep a linear list which has O(N) asymptotic_complexity to traverse. Every node needs to be visited, which has again O(N) asymptotic_complexity. This results in a total behavior of O(N^2) which can eat a lot of CPU where N is large (for example when there are hundreds of olsrd nodes in a network).

speed by efficient sorting

modern SPF implementations use data structures which are efficient at sorting the preliminary path costs like min_heaps or AVL_trees. Since olsrd already had a nice and efficient AVL tree implementation, the two SPF related data structures (the candidate and path tree) are implemented using AVL trees with the path etx metric as the key. Determining the minimal path cost in an AVL tree comes at a cost of O(log(N)) which results in a total asymptotic_complexity of O(N * log(N)), which scales much nicer now in large networks.

Results

In the funkfeuer.at network topology of 190 nodes the raw SPF execution was reduced by 45%. Note that the raw SPF execution represents only about 20% of the CPU cost in a running olsrd. At funkfeuer.at we have observed an overall decrease in the CPU load of about 12% on the embedded routers.

Outlook

10-20% (depending on network size) in the route-handling module is admittedly not exciting. During refactoring the SPF implementation the olsrd-ng development team, has spotted further bottlenecks in the existing implementation. We are tackling this one by one, and would need active participation of the wireless communities to test our improvements and verify if we have added any undesired regressions. so stay tuned and report bugs to the olsrd-dev mailing list.


please check out the patch

200px|supported by IPA made possible by a grant from IPA. Thanks we really appreciate your help and your courage to support us!

main links

Main OLSR-NG project blog: http://olsr.funkfeuer.at

Slides from the OLSR-NG kickoff presentation: http://outpost.funkfeuer.at/~aaron/olsr-ng.pdf

We communicate on the olsr-dev mailinglist: https://www.olsr.org/mailman/listinfo/olsr-dev . All commit messages can be seen on the olsr-cvs list

Goals

  1. Clean up the code of OLSR (http://www.olsr.org),
  2. improve the algorithms of OLSR and make it more scalable.
  3. Furthermore, produce a new RFC for a (potential) new mesh routing protocol which is based on the experiences of OLSR coding (at the moment the most promising candidate for this RFC is B.A.T.M.A.N)


OLSR-NG is a open source project. Meaning everybody is invited to join in and help. We do have some bounties for the best solutions. If you want to participate, drop us an email: mailto:aaron@lo-res.org and mailto:bernd@firmix.at


One of the main goals is to make OLSR more scalable in practice. 350px|right|Complexity for n=1000 nodes of different data structures in the Dijkstra shortest path (SPF) algorithm.

In the this picture you can see the different complexity graphs for the SPF under the assumption that every node has 10 edges . As you can see, the red line has O(n^2) complexity. This conforms to the current implementation of OLSR from www.olsr.org. OLSR-NG plans to reduce the complexity to the green or even the yellow level. This will allow the mesh network clouds to become larger by a factor ~ 1000 (on the routing layer / layer 3).

Current Status

  • olsrd 0.5 was released! Thx everybody a lot!
  • UML test server is being worked on. This will allow the B.A.T.M.A.N team to test their protocol and us to test our scalability ideas with 1000nd of olsr instances.
  • Ongoing code cleanups
  • AVL tree optimizations

UML test server

current load and statistics: http://texas.funkfeuer.at

right|300px|our UML server

center|600px|topo map 1500 UML instances running in parallel. Note the packetloss! (check out the TopologyPics archive also)

topo map 1500 UML instances running in parallel. Note the packetloss!

We have already been running 2000 instances and there was still plenty of RAM left. So 1000 is a very safe bet. However according to the UML docu we can probably safely assume that we can scale up miuch higher because UML will only take the RAM that each instance actually needs. UML actually has other shortcomings: high CPU overhead, lots of context swiches. Trying to increase the performance at the moment...


current open todos UML server

Next important (*) things to do:

  • DONE(aka) update texas's BIOS - FIXED
  • add the packet loss tc rules (zethix already prepared it)
  • create random netowkrs (easy)
  • create network topologies based on a power law distribution ( a bit harder, but realistic for the internet)
  • DONE(zethix) create scripts to find out which olsrd instances crashed
  • create scripts to find out if a UML instance is not responsive anymore
  • find better measurement tools . Look into sar
  • DONE(aka) recompile host kernel and get rid of the "BUG: soft lockup detected on CPU#0!" messages
  • DONE(aka) recompile host kernel and enable the preemtion patch
  • DONE(zethix,aka) make hostfs so that developers can easily upload a new olsrd version to all uml instances. They should see the difference easily. Look into hostfs
  • DONE(ake) increase performance of the UML simulator itself (decrease HZ, look into SKAS3 patch again, 32 bit recompile, talk with jeff etc)
  • find more meaningful topology visualization tools (http://www.caida.org)
  • add b.a.t.m.a.n to the root filesystem. (?)
  • compare the scheduling / scalability of the test with OpenVZ and olsr_switch

User HOWTO

 NOTE! You are root on the system. Effectively we need lots of sudo privs. So... use it wisely.
  1. log in
  2. make clean
  3. edit common.sh and adapt the parameters to your needs
 #!/bin/sh
 #
 # VARS
 #
 MAX_INSTANCES=1500
 ROOT_FS=root_fs
 NICELVL="-n 5"
 u=$USER
 #SINGLE=1

We supply you with a good working root filesystem (root_fs) so no need to change that. The SINGLE parameter just says that you want to start a single instance and be logged in (needed for debugging purposes)

  1. the UML instance can read files and programs from
 $HOME/public_uml/share

This is where you can put your programs or your version of olsrd (and its libs) or the B.A.T.M.A.N. binaries.

 N.B. This directory is shared between all UML instances that you will 
 start in your simulation, so, they all have read-only access to it. 
 It will appear inside each UML as /mnt/share/. There is also another, 
 per-instance, read-write directory that you can use to save data for 
 later analysis (e.g. redirect olsrd stdout to a file and print some 
 debugging info there). This second directory will be under 
 $HOME/public_uml/exp/<UML IP> (where UML IP is the ip address of each 
 UML instance). It will also appear as /mnt/exp inside UML's environment.
  1. put your special rcS file into $HOME/public_uml/share/etc/init.d/ . This rcS file will be called from the UML instances /etc/init.d/rcS startup script. Starting olsrd etc must be done from this user supplied rcS. In case there is no user supplied rcS, then the standard olsrd with the standard settings of the root_fs (/etc/olsrd.conf) us started.
  1. make

This will start the simulation.

 N.B. When the simulation is started, an olsrd instance is started on 
 the host as well. You can use it if you need to interact with the 
 olsrd network - for instance, topology maps are generated through this
 instance (see below). 
  1. Issuing commands inside UML manually - the 'make' command creates a screen session for every UML process it creates, and redirects its input and output there. You can use screen to attach to a particular session. Use
 screen -ls              (as root)

to list all available sessions, and

 screen -S blabla.10.0.x.y -d -RR

to attach to a session. This will give you shell access to the system.

 N.B. All modifications to the root filesystem will be preserved only 
 for the duration of the simulation! Once it is stopped, changes will 
 be lost!
  1. observe the success on http://texas.funkfeuer.at or create a new topo map via ( cd /var/www/topo; ./doit.sh ). If you see a complete graph, then your version has little packetloss!
  1. stop it via
 make clean 

or

 make stop

Please make sure (by looking at http://texas.funkfeuer.at) if you are the only person running a simulation at the moment!

Some things to note

  • the topology visualisation scripts run with nice level +5

the UML instances with nicelevel +10 (see run.sh) -> Never ever go higher than nicelevel 0 because then you will disturb the system monitoring (munin) tools and we will not be able to see what the seimulation is doing.

Open questions/bug reports?

Who wants to contribute?

Who is willing to work on something Contact info
Aaron Kaplan mailto:aaron@lo-res.org
Roman Steiner mailto:roman.steiner@gmx.at
Bernd Petrovitsch mailto:bernd@firmix.at
Andrej Rursev (zethix) mailto:zethix@gmail.com
Hannes Gredler mailto:hannes@gredler.at

Who is working on what?

Who What Status
Bernd Petrovitsch, Thomas Lopatic, Hannes Gredler release 0.5 DONE
 ??? release 0.5 make packages for freifunk FW, DD-WRT, etc, windows (XP, Vista), ... and test them OPEN
 ??? analyze IP autoconfig mechanisms and find the best one OPEN
Hannes Gredler tcpdump parses olsr packets, DONE
Hannes Gredler SPF improvements DONE
Hannes Gredler reduce malloc thrashing during SPF computation WIP
Hannes Gredler improve post-SPF handling (route table conciliation, best path selection) DONE
Bernd Petrovitsch rework the logging system WIP
Bernd Petrovitsch rework the TC-LQ input parsing, avoiding malloc thrashing WIP
Hannes Gredler spurious neighbor loss on nodes with high neighbor count OPEN/investigating
Aaron Kaplan,Bernd Petrovitsch olsr-ng test server DONE
Aaron Kaplan theory, complexity analysis. Goal: find the best complexity on the algorithmic side. DONE
Zethix, Aaron Kaplan UML cluster setup WIP, currently we can start around 2000 UML instances. But the uml_switch software still drops packets between virtual interfaces. http://www.openvz.org seems also like a promising solution

<mm>flash</mm>


contact mailto:aaron@lo-res.org or Bernd if you are interested in participating!

Next Steps

  • TU Wien lecture "Verteilte systeme", 20.4.2007 will present our ideas about optimizing complexity. Aaron also wants to adress more students from the TU to participate. DONE. Let's see if new participants want to join.
  • finalize the UML test server
  • try out the optimization ideas and document the speedup
  • more cleanups
    • olsrd is doing lots of malloc()s and free()s - use ltrace to see this.
      • review malloc()/free() if it theys are superflous and can be implemented with buffers on the stack or just moving pointers around.
      • are there very frequently malloc()ed and free()d struct? Perhaps a free list can help to avoid lots of malloc()/free() handling.
    • we have several coding styles in there
    • add wrappers to hide type casts for Windows (and perhaps others). Reserve some prefix (e.g. x is used for this often as in xmalloc(), olsr_ is IMHO quite long and there too many olsr_ perfixed types and functions right now.)
    • fixup error reporting/logging
    • add synchronization and make the daemon multi-threading (e.g. the httpinfo plugin could benefit from such a thing)
    • make the parameter parsing of the plugins more consistent (some are case-sensitive, some are not, most do not check syntax errors). Work in progress
    • dependencies do not work - done
    • merge quagga-svn and svan-ola quagga-patch and test it.
    • ....

Bounties

please take a look at the slides and get in contact with us directly at the moment!

Source code

  • CVS repos:
 (as user "ipo23" ) 
    export CVS_RSH=ssh
    cvs -z3 -d:ext:ipo23@olsrd.cvs.sourceforge.net:/cvsroot/olsrd co -P olsrd-current
 as anonymous user) 
     cvs -d:pserver:anonymous@olsrd.cvs.sourceforge.net:/cvsroot/olsrd login 
     cvs -z3 -d:pserver:anonymous@olsrd.cvs.sourceforge.net:/cvsroot/olsrd co -P olsrd-current

Theory section

data structures

  • Heap ... We need good heaps/priority queues for A*-Search / Dijkstra
  • especially the Fibonacci Heap has a to my knowledge the very best asymptotic complexity of O(1) almost everywhere.

Currently as of 0.51pre we use a AVL tree which has complexity O(log(n)).

The following complexities<ref> Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest (1990): Introduction to algorithms. MIT Press / McGraw-Hill. </ref> are worst-case for binary and binomial heaps and amortized complexity for Fibonacci heap. O(f) gives asymptotic upper bound and Θ(f) is asymptotically tight bound (see Wikipedia:Big O notation). Function names assume a min-heap.

Operation Binary Binomial Fibonacci
createHeap Θ(1) Θ(1) Θ(1)
findMin Θ(1) O(lg n) or Θ(1) Θ(1)
deleteMin Θ(lg n) Θ(lg n) O(lg n)
insert Θ(lg n) O(lg n) Θ(1)
decreaseKey Θ(lg n) Θ(lg n) Θ(1)
merge Θ(n) O(lg n) Θ(1)

other interesting data structures not directly related

See also

Notes

<references/>

Links

Papers, Theory

  AdHocSys is a two-year European project to provide reliable broadband services in rural and mountain regions. This objective
  will be achieved by means of the creation of a wireless ad hoc broadband network, with special enhancements to reliability
  and availability. The network consists of one or several gateways connecting to the global Internet and several intermediate
  nodes which provide multihop connections between the gateways and end users.
  • WOSPF-OR Uni Oslo Wireless OSPF with Overlapping Relays
  • W-OSPF INRA/Boing Wireless OSPF

misc