Today I have done the first useful processing of the "transaction trace" I produced from the big MtGox show-of-bitcoins in December 2011. After the "show" the bitcoins were reportedly routed away out of sight in interesting ways (see my last-but-one post). I wanted to do some processing to confirm this.
So, I took my big trace file (around 70mb of csv and still running when I stopped the program), then ran it through a python script to select only transactions above 100 coins (just to get the data volume under control). That yielded a Gephi-compatible ASCII edge file with around 30k edges defined. I put that into Gephi and tried some layout options. Found Force Atlas 2 was the only layout manager that produced interesting results in less than a couple of hours. But HOW interesting!!!
One thing the exercise showed was that I still have too much data. It pretty much overwhelms Gephi - which is the most powerful tool I know of for this kind of stuff. Reducing the data without loosing valuable info will need some though. But here are a couple of Gephi pics to illustrate things I found.
First off, the above image shows Gephi Force Atlas 2 beginning to untravel the structure of a reduced set of data - this one was transaction size limited to 4000 coins (only transactions over 4000 coins). It looks like a big can of worms (and maybe indeed that's what it is). Remember you're watching over a couple of million Euros worth of bitcoin moving there. The ribbons appear to be chains of transactions with one input and one output. It looks like coins were transferred through a really large numbers of addresses in a linear fashion without being dispersed much.
Now, above is the really interesting one. This is one, detached part of the full graph view. It shows the period immediately after the coins leave the main MtGox address. This time the transaction threshold is set at 1000 coins to show up more fine detail. The trace starts at the small toasting fork feature, off the right hand branch (angled at about 4 o' clock). Just as described by the previous author, they get bucket-chained along for quite a while, then they are split repeatedly smaller and smaller, giving the tree structure. This matches the previous investigation. They seem to get split down below my transaction size threshold and disappear from the trace, only to show up later in the huge ribbon structure. There is also a big recirculating loop structure (upper left) with bigger arrows indicating multiple transactions. Is that a tumbler?
There's a lot more to be investigated here, clearly, but I'm at the limits of the techniques I know right now and I need some fresh ideas to deal with these massive data sets. Still, I'm pleased with these initial results.


No comments:
Post a Comment