OLSR-NG
<google>OLSR</google>
Inhaltsverzeichnis
NEWS
We have released our first set of improvements to the olsrd SPF calculation module.
SPF implementation
When executing the SPF calculation upon every iteration the least cost path needs to be extracted and put on the result list. For that purpose olsrd-current does keep a linear list which has O(N) asymptotic_complexity to traverse. Every node needs to be visited, which has again O(N) asymptotic_complexity. This results in a total behavior of O(N^2) which can eat a lot of CPU where N is large (for example when there are hundreds of olsrd nodes in a network).
speed by efficient sorting
modern SPF implementations use data structures which are efficient at sorting the preliminary path costs like min_heaps or AVL_trees. Since olsrd already had a nice and efficient AVL tree implementation, the two SPF related data structures (the candidate and path tree) are implemented using AVL trees with the path etx metric as the key. Determining the minimal path cost in an AVL tree comes at a cost of O(log(N)) which results in a total asymptotic_complexity of O(N * log(N)), which scales much nicer now in large networks.
Results
In the funkfeuer.at network topology of 190 nodes the raw SPF execution was reduced by 45%. Note that the raw SPF execution represents only about 20% of the CPU cost in a running olsrd. At funkfeuer.at we have observed an overall decrease in the CPU load of about 12% on the embedded routers.
Outlook
10-20% (depending on network size) in the route-handling module is admittedly not exciting. During refactoring the SPF implementation the olsrd-ng development team, has spotted further bottlenecks in the existing implementation. We are tackling this one by one, and would need active participation of the wireless communities to test our improvements and verify if we have added any undesired regressions. so stay tuned and report bugs to the olsrd-dev mailing list.
please check out the patch
sponsor
200px|supported by IPA made possible by a grant from IPA. Thanks we really appreciate your help and your courage to support us!
main links
Main OLSR-NG project blog: http://olsr.funkfeuer.at
Slides from the OLSR-NG kickoff presentation: http://outpost.funkfeuer.at/~aaron/olsr-ng.pdf
We communicate on the olsr-dev mailinglist: https://www.olsr.org/mailman/listinfo/olsr-dev . All commit messages can be seen on the olsr-cvs list
Goals
- Clean up the code of OLSR (http://www.olsr.org),
- improve the algorithms of OLSR and make it more scalable.
- Furthermore, produce a new RFC for a (potential) new mesh routing protocol which is based on the experiences of OLSR coding (at the moment the most promising candidate for this RFC is B.A.T.M.A.N)
OLSR-NG is a open source project. Meaning everybody is invited to join in and help.
We do have some bounties for the best solutions. If you want to participate, drop us an email: mailto:aaron@lo-res.org and mailto:bernd@firmix.at
One of the main goals is to make OLSR more scalable in practice.
350px|right|Complexity for n=1000 nodes of different data structures in the Dijkstra shortest path (SPF) algorithm.
In the this picture you can see the different complexity graphs for the SPF under the assumption that every node has 10 edges . As you can see, the red line has O(n^2) complexity. This conforms to the current implementation of OLSR from www.olsr.org. OLSR-NG plans to reduce the complexity to the green or even the yellow level. This will allow the mesh network clouds to become larger by a factor ~ 1000 (on the routing layer / layer 3).
Current Status
- olsrd 0.5 was released! Thx everybody a lot!
- UML test server is being worked on. This will allow the B.A.T.M.A.N team to test their protocol and us to test our scalability ideas with 1000nd of olsr instances.
- Ongoing code cleanups
- AVL tree optimizations
UML test server
current load and statistics: http://texas.funkfeuer.at
center|600px|topo map 1500 UML instances running in parallel. Note the packetloss! (check out the TopologyPics archive also)
topo map 1500 UML instances running in parallel. Note the packetloss!
We have already been running 2000 instances and there was still plenty of RAM left. So 1000 is a very safe bet. However according to the UML docu we can probably safely assume that we can scale up miuch higher because UML will only take the RAM that each instance actually needs. UML actually has other shortcomings: high CPU overhead, lots of context swiches. Trying to increase the performance at the moment...
current open todos UML server
Next important (*) things to do:
- DONE(aka) update texas's BIOS - FIXED
- add the packet loss tc rules (zethix already prepared it)
- create random netowkrs (easy)
- create network topologies based on a power law distribution ( a bit harder, but realistic for the internet)
- DONE(zethix) create scripts to find out which olsrd instances crashed
- create scripts to find out if a UML instance is not responsive anymore
- find better measurement tools . Look into sar
- DONE(aka) recompile host kernel and get rid of the "BUG: soft lockup detected on CPU#0!" messages
- DONE(aka) recompile host kernel and enable the preemtion patch
- DONE(zethix,aka) make hostfs so that developers can easily upload a new olsrd version to all uml instances. They should see the difference easily. Look into hostfs
- DONE(ake) increase performance of the UML simulator itself (decrease HZ, look into SKAS3 patch again, 32 bit recompile, talk with jeff etc)
- find more meaningful topology visualization tools (http://www.caida.org)
- add b.a.t.m.a.n to the root filesystem. (?)
- compare the scheduling / scalability of the test with OpenVZ and olsr_switch
User HOWTO
NOTE! You are root on the system. Effectively we need lots of sudo privs. So... use it wisely.
- log in
- make clean
- edit common.sh and adapt the parameters to your needs
#!/bin/sh # # VARS # MAX_INSTANCES=1500 ROOT_FS=root_fs NICELVL="-n 5" u=$USER #SINGLE=1
We supply you with a good working root filesystem (root_fs) so no need to change that. The SINGLE parameter just says that you want to start a single instance and be logged in (needed for debugging purposes)
- the UML instance can read files and programs from
$HOME/public_uml/share
This is where you can put your programs or your version of olsrd (and its libs) or the B.A.T.M.A.N. binaries.
N.B. This directory is shared between all UML instances that you will start in your simulation, so, they all have read-only access to it. It will appear inside each UML as /mnt/share/. There is also another, per-instance, read-write directory that you can use to save data for later analysis (e.g. redirect olsrd stdout to a file and print some debugging info there). This second directory will be under $HOME/public_uml/exp/<UML IP> (where UML IP is the ip address of each UML instance). It will also appear as /mnt/exp inside UML's environment.
- put your special rcS file into $HOME/public_uml/share/etc/init.d/ . This rcS file will be called from the UML instances /etc/init.d/rcS startup script. Starting olsrd etc must be done from this user supplied rcS. In case there is no user supplied rcS, then the standard olsrd with the standard settings of the root_fs (/etc/olsrd.conf) us started.
- make
This will start the simulation.
N.B. When the simulation is started, an olsrd instance is started on the host as well. You can use it if you need to interact with the olsrd network - for instance, topology maps are generated through this instance (see below).
- Issuing commands inside UML manually - the 'make' command creates a screen session for every UML process it creates, and redirects its input and output there. You can use screen to attach to a particular session. Use
screen -ls (as root)
to list all available sessions, and
screen -S blabla.10.0.x.y -d -RR
to attach to a session. This will give you shell access to the system.
N.B. All modifications to the root filesystem will be preserved only for the duration of the simulation! Once it is stopped, changes will be lost!
- observe the success on http://texas.funkfeuer.at or create a new topo map via ( cd /var/www/topo; ./doit.sh ). If you see a complete graph, then your version has little packetloss!
- stop it via
make clean
or
make stop
Please make sure (by looking at http://texas.funkfeuer.at) if you are the only person running a simulation at the moment!
Some things to note
- the topology visualisation scripts run with nice level +5
the UML instances with nicelevel +10 (see run.sh) -> Never ever go higher than nicelevel 0 because then you will disturb the system monitoring (munin) tools and we will not be able to see what the seimulation is doing.
Open questions/bug reports?
Who wants to contribute?
Who is willing to work on something | Contact info |
---|---|
Aaron Kaplan | mailto:aaron@lo-res.org |
Roman Steiner | mailto:roman.steiner@gmx.at |
Bernd Petrovitsch | mailto:bernd@firmix.at |
Andrej Rursev (zethix) | mailto:zethix@gmail.com |
Hannes Gredler | mailto:hannes@gredler.at |
Who is working on what?
Who | What | Status | Comments |
---|---|---|---|
Bernd Petrovitsch, Thomas Lopatic, Hannes Gredler | release 0.5 | DONE | |
??? | release 0.5 make packages for freifunk FW, DD-WRT, etc, windows (XP, Vista), ... and test them | OPEN | freifunk FW is done by Sven-Ola Tücke, .rpm and .deb by various people on olsr-dev@lists.olsr.org, Windows: ??? |
??? | analyze IP autoconfig mechanisms and find the best one | OPEN | |
Hannes Gredler | tcpdump parses olsr packets, | DONE | |
Hannes Gredler | SPF improvements | DONE | |
Hannes Gredler | reduce malloc thrashing during SPF computation | WIP | |
Hannes Gredler | improve post-SPF handling (route table conciliation, best path selection) | DONE | |
Bernd Petrovitsch | rework the logging/tracing/error reporting | WIP | |
Bernd Petrovitsch | rework the LQ-TC and LQ-HELLO input parsing, avoiding malloc thrashing | DONE | The output side can also be avoid malloc() and free(). Alas, the code is more complicated there. |
Hannes Gredler | spurious neighbor loss on nodes with high neighbor count | OPEN/investigating | |
Aaron Kaplan,Bernd Petrovitsch | olsr-ng test server | DONE | Well, the thing doesn't boot ATM. God knows why .... |
Aaron Kaplan | theory, complexity analysis. Goal: find the best complexity on the algorithmic side. | DONE | |
Zethix, Aaron Kaplan | UML cluster setup | WIP, currently we can start around 2000 UML instances. But the uml_switch software still drops packets between virtual interfaces. http://www.openvz.org seems also like a promising solution | |
Bernd Petrovitsch | Variuos Cleanup Mini- Projects | DONE/WIP | reworked floating point ops in src/mantissa.[ch] to minimize run-time impact, fixed dependencies, |
<mm>flash</mm>
contact mailto:aaron@lo-res.org or Bernd if you are interested in participating!
Next Steps
- TU Wien lecture "Verteilte systeme", 20.4.2007 will present our ideas about optimizing complexity. Aaron also wants to adress more students from the TU to participate. DONE. Let's see if new participants want to join.
- finalize the UML test server
- try out the optimization ideas and document the speedup
- more cleanups
- olsrd is doing lots of malloc()s and free()s - use ltrace to see this.
- review malloc()/free() if it theys are superflous and can be implemented with buffers on the stack or just moving pointers around.
- are there very frequently malloc()ed and free()d struct? Perhaps a free list can help to avoid lots of malloc()/free() handling.
- we have several coding styles in there
- add wrappers to hide type casts for Windows (and perhaps others). Reserve some prefix (e.g. x is used for this often as in xmalloc(), olsr_ is IMHO quite long and there too many olsr_ perfixed types and functions right now.)
- fixup error reporting/tracing/logging
- add synchronization and make the daemon multi-threading (e.g. the bmf plugin uses it, the httpinfo plugin could benefit from such a thing)
- make the parameter parsing of the plugins more consistent (some are case-sensitive, some are not, most do not check syntax errors). Work in progress
- ....
- olsrd is doing lots of malloc()s and free()s - use ltrace to see this.
Bounties
please take a look at the slides and get in contact with us directly at the moment!
Source code
- CVS repos:
(as user "ipo23" ) export CVS_RSH=ssh cvs -z3 -d:ext:ipo23@olsrd.cvs.sourceforge.net:/cvsroot/olsrd co -P olsrd-current as anonymous user) cvs -d:pserver:anonymous@olsrd.cvs.sourceforge.net:/cvsroot/olsrd login cvs -z3 -d:pserver:anonymous@olsrd.cvs.sourceforge.net:/cvsroot/olsrd co -P olsrd-current
Theory section
data structures
- Heap ... We need good heaps/priority queues for A*-Search / Dijkstra
- especially the Fibonacci Heap has a to my knowledge the very best asymptotic complexity of O(1) almost everywhere.
However, practice shows that... Currently as of 0.51pre we use a AVL tree which has complexity O(log(n)). Hannes tested the fibheap package from gcc and found out that in our networks (~ 200 nodes) the AVL tree heap implementation still beats the fibonacci heap by 60%.
fibonacci heap:
--- SPF-stats for 203 nodes, 335 routes (total/init/run/route/kern/cleanup):,, 237, --- SPF-stats for 203 nodes, 337 routes (total/init/run/route/kern/cleanup):,, 238, --- SPF-stats for 203 nodes, 337 routes (total/init/run/route/kern/cleanup):,, 238, --- SPF-stats for 203 nodes, 339 routes (total/init/run/route/kern/cleanup):,, 239, --- SPF-stats for 203 nodes, 339 routes (total/init/run/route/kern/cleanup):,, 238, --- SPF-stats for 203 nodes, 341 routes (total/init/run/route/kern/cleanup):,, 240, --- SPF-stats for 203 nodes, 341 routes (total/init/run/route/kern/cleanup):,, 236, --- SPF-stats for 203 nodes, 341 routes (total/init/run/route/kern/cleanup):,, 238, --- SPF-stats for 203 nodes, 341 routes (total/init/run/route/kern/cleanup):,, 238, --- SPF-stats for 203 nodes, 345 routes (total/init/run/route/kern/cleanup):,, 239, --- SPF-stats for 203 nodes, 346 routes (total/init/run/route/kern/cleanup):,, 238, --- SPF-stats for 203 nodes, 346 routes (total/init/run/route/kern/cleanup):,, 238, --- SPF-stats for 203 nodes, 346 routes (total/init/run/route/kern/cleanup):,, 238, --- SPF-stats for 203 nodes, 346 routes (total/init/run/route/kern/cleanup):,, 238, --- SPF-stats for 203 nodes, 347 routes (total/init/run/route/kern/cleanup):,, 238, AVL heap: --- SPF-stats for 203 nodes, 346 routes (total/init/run/route/kern/cleanup):,, 143, --- SPF-stats for 203 nodes, 346 routes (total/init/run/route/kern/cleanup):,, 142, --- SPF-stats for 203 nodes, 346 routes (total/init/run/route/kern/cleanup):,, 142, --- SPF-stats for 203 nodes, 346 routes (total/init/run/route/kern/cleanup):,, 144, --- SPF-stats for 203 nodes, 346 routes (total/init/run/route/kern/cleanup):,, 145, --- SPF-stats for 203 nodes, 347 routes (total/init/run/route/kern/cleanup):,, 145, --- SPF-stats for 203 nodes, 346 routes (total/init/run/route/kern/cleanup):,, 142, --- SPF-stats for 203 nodes, 346 routes (total/init/run/route/kern/cleanup):,, 142, --- SPF-stats for 203 nodes, 346 routes (total/init/run/route/kern/cleanup):,, 144, --- SPF-stats for 203 nodes, 346 routes (total/init/run/route/kern/cleanup):,, 145, --- SPF-stats for 203 nodes, 347 routes (total/init/run/route/kern/cleanup):,, 145, --- SPF-stats for 202 nodes, 347 routes (total/init/run/route/kern/cleanup):,, 145, --- SPF-stats for 202 nodes, 347 routes (total/init/run/route/kern/cleanup):,, 142, --- SPF-stats for 202 nodes, 347 routes (total/init/run/route/kern/cleanup):,, 146,
The following complexities<ref> Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest (1990): Introduction to algorithms. MIT Press / McGraw-Hill. </ref> are worst-case for binary and binomial heaps and amortized complexity for Fibonacci heap. O(f) gives asymptotic upper bound and Θ(f) is asymptotically tight bound (see Wikipedia:Big O notation). Function names assume a min-heap.
Operation | Binary | Binomial | Fibonacci |
---|---|---|---|
createHeap | Θ(1) | Θ(1) | Θ(1) |
findMin | Θ(1) | O(lg n) or Θ(1) | Θ(1) |
deleteMin | Θ(lg n) | Θ(lg n) | O(lg n) |
insert | Θ(lg n) | O(lg n) | Θ(1) |
decreaseKey | Θ(lg n) | Θ(lg n) | Θ(1) |
merge | Θ(n) | O(lg n) | Θ(1) |
- Wikipedia:Data_Structures general overview. Good entry point for trees: Wikipedia:Binary_tree
- NIST Directory of Data Structures has a very extensive overview
- succinct datastructures (trees)
- succinct datastructures overview
- Tries
- sparse matrices
- look at kazlib from IS-IS ??
See also
Notes
<references/>
Links
Papers, Theory
- RFC-3626: the "OLSR RFC"
- Workshop at Hipercom Oct 2006
- OLSR-v2 Draft 01 at hipercom
- http://www.adhocsys.org/
AdHocSys is a two-year European project to provide reliable broadband services in rural and mountain regions. This objective will be achieved by means of the creation of a wireless ad hoc broadband network, with special enhancements to reliability and availability. The network consists of one or several gateways connecting to the global Internet and several intermediate nodes which provide multihop connections between the gateways and end users.
misc
- Homepage: http://www.olsr.org/
- NATO C3 Agency (NC3A) Radio Protocols Lab https://elayne.nc3a.nato.int/
- commercial INRIA HIPERCOM spin-off http://www.luceor.com/
- commercial MIT Roofnet spin-off http://www.meraki.net/