Email or username:

Password:

Forgot your password?
12 comments
Gregory

Finally. Had to add one more dependency

Gregory

First observations:
- Object and category IDs are 12 bytes long (of course I'm not going to store them as these hex strings) which is odd
- The categories are fine-grained enough that "restaurant" is further subdivided into cuisine-specific ones like "Italian restaurant" and "Asian restaurant"
- Regions and cities are stored as names (strings) and don't have IDs

Gregory

I'd have to load this stuff into an actual database to meaningfully explore it any further (for one, I'm really curious to see what's in there in my familiar parts of St Petersburg). Will try tomorrow.

Gregory

Now I'll just have to wait while it does its thing.

Gregory

Oh for fuck's sake. Why are there places that
- don't have coordinates
- don't have an address
- don't have a country
- don't have a date when they were created?!
What's next, places that don't have a name?

Gregory

Also what's this thing with throwing exceptions, especially a RuntimeException, when a value is not present? Just return a null ffs. It's not like it's a catastrophic event that warrants an exception that you can't even meaningfully catch.

Gregory

I feel like the more data is in the table, the longer it takes to insert new rows into it. And nope that 10.5 GB estimate wasn't right, it's more like 20-something because of the compression. I misled myself by opening one of the files in a hex editor, seeing meaningful strings, and assuming that it must be uncompressed.

Gregory

It's done now! The table is 26.67 GB, there are 106 620 768 rows.

dmitriid

@grishka

Fallacies programmers believe about places? :)

foxy

@grishka

Первые наблюдения:
- модерация в 4sq не спасла
- за интересные данные придётся заплатить
- В России как минимум 1600 регионов

Gregory

@foxy ору
это было "... WHERE `name` LIKE '%хуй%'"?))

Go Up