11 comments
I'd have to load this stuff into an actual database to meaningfully explore it any further (for one, I'm really curious to see what's in there in my familiar parts of St Petersburg). Will try tomorrow. Oh for fuck's sake. Why are there places that Also what's this thing with throwing exceptions, especially a RuntimeException, when a value is not present? Just return a null ffs. It's not like it's a catastrophic event that warrants an exception that you can't even meaningfully catch. I feel like the more data is in the table, the longer it takes to insert new rows into it. And nope that 10.5 GB estimate wasn't right, it's more like 20-something because of the compression. I misled myself by opening one of the files in a hex editor, seeing meaningful strings, and assuming that it must be uncompressed. Первые наблюдения: |
First observations:
- Object and category IDs are 12 bytes long (of course I'm not going to store them as these hex strings) which is odd
- The categories are fine-grained enough that "restaurant" is further subdivided into cuisine-specific ones like "Italian restaurant" and "Asian restaurant"
- Regions and cities are stored as names (strings) and don't have IDs