not a beautiful or unique snowflake (nothings) wrote,
not a beautiful or unique snowflake

Technical geekiness for huskyscotsman about our ICFP entry.

The problem: drive a robot aroud a board dropping packages. Some squares on the board are impassable; some are water, which if you drive into, you die. There are several bases; packages start there. When you stand on a square on the board, you're told what packages are there. If a bot runs into another bot, it pushes the second one (including into water, but not into a wall) and it drops one package at random. You bid from a finite money supply to decide who goes first; a large positive bid makes you go first, and a "large" negative money bid makes you go last. Each turn, you find out how other bots moved, and what packages they picked up or dropped. When you drop a package on its destination, you score points and it disappears. In the server response, it is indistinguishable from an arbitrary drop.

Very early on, I suggested the strategy 'pick up a bunch of packages, drop them four squares away from the base, repeat,', with the idea being that other robots will assume they were delivered when they see them dropped, and we can thus go back and get them unmolested. Not only did we never implement this, but we didn't even have time to implement a defense against it.

Our robot was written in Ocaml and used just under 2000 lines of code. It consisted of three major systems, which ran in series each turn.

1. Determine the "optimal" move with the 'strategist'
2. Override local navigation with the 'pattern matcher'
3. Compute the amount to bid with the 'bidder'

Each of these systems worked totally independently; the strategist picks a goal, computes one or more moves to achieve that goal, and then reports just the first move as "what to do now". The pattern matcher looks at that one move, irrespective of what it is supposed to accomplish, and attempts to adjust it for local navigation and certain strategies. The bidder takes the suggested move, looks at the presence of other robots in the vicinity, and attempts to outbid those robots.

The strategist evaluates six or seven possible things to try to do, in a particular order. The strategist is stateless; it makes plans from scratch every turn, rather than trying to remember a plan from turn to turn. This is mostly because we didn't have the time to have it remember plans, but also because this makes it resilient to changes, like being pushed.

The first thing the strategist checks is if it is in position to dunk another robot. If it is, and if it isn't the case that if it misses because the other bot moves first, the other bot will be in position to push IT into water--if that isn't the case, then it goes for the kill.

The second thing the strategiest checks is if it has any packages that are at their destination; if so, it drops them.

The third thing the strategist checks is if there are any packages here it can pick up. If there are, it clumps the packages into groups by their destination, and then finds the group with the best ratio of value to distance-from-here. (We do a BFS across the entire map from the robot's location to allow distance reasoning and pathfinding. A* or something more clever for pathfinding is going to be slower in mazes, since it's more expensive inner loop, and won't allow distance reasoning.) If that pickup will still leave space free, it evaluates other packages in the same way, but now reasoning about distances from the distance to the last package destination considered.

The fourth thing the strategist checks is whether in the last five moves the bot has moved more than a square. (Back and forth in two squares is NOT moving more than a square.) If so, there is a 25% chance it doesn't move. This is to avoid "mexican standoffs", where bot A pushes bot B east, so B drops a package. bot A picks up the package, but bot B pushes A west, resulting in A picking up the package (or not, depending on bids) and then dropping it when pushed, and B ending up on top of it; the cycle repeats endlessly. It also avoids a problem with meeting in a two-way corridor and both bots try to sidestep to get around the other, and they repeat that endlessly.

The fifth thing the strategist checks is if it has any packages, then pick the nearest delivery location, compute a route to it, and take the first step on that route.

The sixth thing the strategist checks, since it must have no packages, is the nearest place-where-it-knows-there-are-packages or the nearest unexplored base. Then it takes a step to it.

Otherwise it sits in place.

The pattern matcher attempts to navigate around obstacles, set up kills, avoid putting itself in a place to get killed, and other stuff. It does this by using a collection of hand-built map templates that represent situations involving other bots and their predicted movements, relative to our robot trying to make a specific move. There's a template which represents our robot trying to go head-on into another robot, when there is also a row of empty spaces next to it--in which case we go around. There's a template which means 'this robot is waiting for us to move into a space between it and water; stand still and wait for it to move away'. There's a template which means "this robot is coming towards us down a 1-wide corridor that we're near the mouth of, so back out of the corridor to make way for it". (submitted patterns)

The bidder says: are we in position to be dunked, and we're moving? then bid 5% of our money. is our move an attempt to dunk someone else? then bid 1% of our money. is our bid potentially going to conflict with other bots? then bid X where X is our estimated amount that they're bidding, based on past situations where they seem to have outbid us (bids are secret).

There were infinite things we thought of but didn't have time to do. (This is mainly because we were using ocaml despite the fact that nobody on our team knew it well--dfan had a week of experience with it, I had pair-programmed for 4-8 hours with, but not in the driver's seat, and inky had done a little ML in a college class.) The above is a very utilitarian strategy, since there's no long-term planning. We don't go long distances to hunt someone down and dunk them; we only dunk opportunistically. (The pattern matcher encourages us to stop in opportune locations or even to move one square for them.) We never dunk predictively: guess that a robot is going to go to square X, and move to square X with a large negative bid. We didn't think to put in code to handle undeliverable packages; the existing code is moderately tolerant of them, but we could end up with an armful of undeliverables and sit in place the rest of the game. We compute a prediction for which ways each bot might go, but we don't use that in the bidding system for guessing whether we'll be in conflict. We assume dropped packages whose destinations we haven't learned (you only learn them by standing over them) were delivered. I thought up an easy to implement scheme where, while going from point A to point B, I can basically for free evaluate any point on the map and say "can we hit this point while going shortest path from A to B" (becuase there are a lot of such shortest paths, since diagonal moves aren't allowed); we could have used this for investigating unexplored bases or going over supposed delivery locations while going from A to B; but I never implemented it. I ddn't start writing the pattern matcher until nearly midnight, and didn't finish it until like 4am when the contest was at noon, so the patterns weren't as tuned as I'd have liked. Bots clog horribly in 1-wide corridors, and one of the test maps on the public server was semi-maze-like with one-wide corridors everywhere; I had some ideas for implementing smart 1-wide corridor logic, computing 'zones' which are continuous 1-wide regions, and then detecting there are bots in them and seeing whether they're emerging or going in, but I didn't have time for it (I implemented a very short-lookahead version of this in the pattern matcher). We don't keep any state from turn to turn, so we recompute all pathfinds every turn, which wastes computation. They say they'll disqualify bots that use more than 1 CPU second per turn on average, and they allowed for boards up to 1000x1000 squares. We have no idea how we'll perform on such boards, but clearly saving plans and paths from turn to turn would have improved it. We solve the integer packing and travelling salesmen problems in greedy fashion--pick the package group with the best score/distance ratio, then the next such group, repeat. But then when we pathfind, we just go to the nearest, not the best score/distance. The pattern matching stuff all only handles one bot in the vicinity. If there are two we just plow ahead with our plans. If we attempt to pickup or drop, we never check that we're in danger of being dunked, so we don't avoid being dunked. This can also occur if we select 'pickup nothing' as our 'don't do anything this turn' move, e.g. because our pathfind fails or because we decide we're stuck and do nothing due to the 25% chance of that.

Other team's strategies are here.
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.