Developer.
167 stories
·
57 followers

Time Slips Away, and Leaves You With Nothing, Mister

1 Comment

Microsoft, commemorating the 20th anniversary of the release of Windows 95:

On Aug. 24, 1995, Windows 95 arrived. And if you were around then, you may remember the song that accompanied the commercial introducing it: “Start Me Up” by the Rolling Stones.

To celebrate the 20th anniversary of this release, download the classic song for free until 11:59 p.m. PST from the Windows Store.

I humbly suggest a more apt song to mark the occasion.

occasion](https://www.youtube.com/watch?v=6vQpW9XRiyM).

Read the whole story
rafeco
5 days ago
reply
I thought the link would be to "If I Could Turn Back Time."
Share this story
Delete

The thousands of bombs exploded on Earth

2 Comments

From Orbital Mechanics, a visualization of the 2153 nuclear weapons exploded on Earth since 1945.

2153! I had no idea there had been that much testing. According to Wikipedia, the number is 2119 tests, with most of those coming from the US (1032) and the USSR (727). The largest device ever detonated was Tsar Bomba, a 50-megaton hydrogen bomb set off in the atmosphere above an island in the Barents Sea in 1961. Tsar Bomba had more than three times the yield of the largest bomb tested by the US. The result was spectacular.

The fireball reached nearly as high as the altitude of the release plane and was visible at almost 1,000 kilometres (620 mi) away from where it ascended. The subsequent mushroom cloud was about 64 kilometres (40 mi) high (over seven times the height of Mount Everest), which meant that the cloud was above the stratosphere and well inside the mesosphere when it peaked. The cap of the mushroom cloud had a peak width of 95 kilometres (59 mi) and its base was 40 kilometres (25 mi) wide.

All buildings in the village of Severny (both wooden and brick), located 55 kilometres (34 mi) from ground zero within the Sukhoy Nos test range, were destroyed. In districts hundreds of kilometers from ground zero wooden houses were destroyed, stone ones lost their roofs, windows and doors; and radio communications were interrupted for almost one hour. One participant in the test saw a bright flash through dark goggles and felt the effects of a thermal pulse even at a distance of 270 kilometres (170 mi). The heat from the explosion could have caused third-degree burns 100 km (62 mi) away from ground zero. A shock wave was observed in the air at Dikson settlement 700 kilometres (430 mi) away; windowpanes were partially broken to distances of 900 kilometres (560 mi). Atmospheric focusing caused blast damage at even greater distances, breaking windows in Norway and Finland. The seismic shock created by the detonation was measurable even on its third passage around the Earth.

The Soviets did not give a fuck, man...what are a few thousand destroyed homes compared to scaring the shit out of the capitalist Amerikanskis with a comically large explosion? Speaking of bonkers Communist dictatorships, the last nuclear test conducted on Earth was in 2013, by North Korea.

Tags: atomic bomb   Cold War   infoviz   video   war
Read the whole story
rafeco
11 days ago
reply
Anyone betting on the long time survival of the human race may want to reassess
Share this story
Delete
1 public comment
kraymer
12 days ago
reply
Vishnu wouldn't approve, fi sure. https://www.youtube.com/watch?v=n8H7Jibx-c0

The increasingly long life span of Macs ↦

3 Comments

by Dan Moren

Over at Macworld last week, my Stay Foolish column tackled the idea of Macs lasting longer and longer these days:

This longevity goes hand in hand with the decline in specs that I wrote about last month. We’ll continue to use our devices as long as they accomplish what we want them to, not simply when specs suggest we “should” upgrade.

So I think of my dad, working away on his 2008 MacBook. Ultimately, if Apple continues at the pace that it’s at today, that computer’s useful life could extend to a decade from its purchase, which is a pretty outstanding return on investment. Granted, he mainly uses his machine for web browsing and email reading, tasks that aren’t exactly taxing even to a seven-year-old machine.

Read the rest at Macworld…

[Read on Six Colors.]

Read the whole story
rafeco
12 days ago
reply
We still have a late 2007 iMac that we use every day.
Share this story
Delete
2 public comments
kazriko
12 days ago
reply
I still have a PC from 2006. It's not really great to use, but it is still usable. It's certainly not my primary. It uses Debian with xmonad.
Colorado Plateau
gradualepiphany
12 days ago
Yeah.. the thing is that the 2008 MP is still going pretty strong. It's got two 4 core 3.06ghz xeons & 32 gigs of ram, so it's even still pretty good at threaded renders & simulations. It's now got a few SSDs, an eSATA interface for the external RAID, and it's on it's 4th or 5th graphics card... There's a LOT to be said for buying top-of-the-line workstation hardware stuck in a traditional desktop box. Just about none of that incremental upgrading is possible with the new Pros, which is a bleeding shame. Technically the CPU, Ram, and Graphics cards *should* be upgradable, but in practice no one makes 3rd party GPUs with the custom board layout that's necessary, so.... useless.
kazriko
12 days ago
Well, my PC was a slow system when I bought it, Athlon64x2 at 2.6ghz dual-core. I guess our servers are still from 2008, and are still going strong with Citrix Xencenter here, those are pretty similar to your 2008 mac pro in specs. No SSDs though, but they do have 10k SAS drives. My current desktop system at work is a FX-8350 with 32gigs and SSD + SSHD, which chews through the compile workloads I give it.
superiphi
10 days ago
In truth, apart from memory amounts, you can probably go back to 2002-ish and still have computers you can use perfectly well to do day to day work. And many of these can be stretched enough in the memory department
trekkie
10 days ago
iMac late 2012 still going strong. I upgraded to it from a Mac Pro I got in 2006, really only because i couldn't move to 10.8. I like tons of storage and that was the last mac that could do a ton internally.
gradualepiphany
12 days ago
reply
Yeah, and unfortunately my ancient 2008 Mac Pro is still in many ways superior to those stupid new trashcan ones.
Los Angeles, California, USA

Have a Theory

1 Share
There is a phrase I find myself employing pretty frequently at work, when discussing new features or products. While I am not a product manager, I am responsible for making sure that we implement features well, and thinking strategically about what we are spending our precious time implementing. So, when I am asked about my thoughts on a new product or feature, I usually have one and only one question:

"What is your theory?"

In this day and age we sometimes get lazy about thoughtfulness, and rely on data and experimentation to hill-climb our way through the world around us. Or at least we say that we rely on data and experimentation to drive our features. But the reality is that we're working in such complex multivariate environments that we cannot possibly test all permutations of even the simplest change. We do make choices about what features we build, and these choices are not entirely data-driven.

So, given that our choices cannot be entirely perfectly data-driven, how then do we decide what to build? The only way that we can make sane choices in a complex world is by actually being thoughtful about the choices we are making, creating a theory, and creating experiments that actually test that theory.

For example, in my current world of e-commerce, we often are faced with the mandate to implement a new feature that will make the customer feel better about the product in some nebulous way (it's cooler! it's more high fashion! millennials will love it! whatever). This feature, while it might not cause customers to immediately buy more up front, should cause them to be more loyal over time. Sometimes, this is the right instinct. But beware: if you're going to try to get second or third order effects from a feature, you'd better have a really solid theory of the chain of events that leads to those second or third order impacts. And you need to figure out what you can measure to validate the chain of events. Don't just look at the number of people buying the product and hope it goes up. What does making the look and feel "cooler" DO for your customer? Do they visit more often? Spend more time? Tell more friends? Have a theory!

Failing to have a theory, and a solid experimentation plan for proving that theory, leaves you open to all kinds of irrational outcomes. The worst of these is the "you just didn't implement it well enough" outcome. The original idea was good, but you implemented it poorly, and that's why it failed. And that could very well be true! But it's impossible to prove or disprove without anticipating the question ahead of time, thinking through the logical conclusions of the theory, and setting up a good test to understand its outcome.

So the next time you are building a feature, ask yourself: Do we have a theory? What is it? Are we measuring the immediate expected effects of the theory, or are we just measuring the same stuff we always measure and hoping that it changes?
Read the whole story
rafeco
30 days ago
reply
Share this story
Delete

Search Architecture

1 Comment

Instagram is in the fortunate position to be a small company within the infrastructure of a much larger one. When it makes sense, we leverage resources to leapfrog into experiences that have taken Facebook ten years to build. Facebook’s search infrastructure, Unicorn, is a social-graph-aware search engine that has scaled to indexes containing trillions of documents. In early 2015, Instagram migrated all search infrastructure from Elasticsearch into Unicorn. In the same period, we saw a 65% increase in search traffic as a result of both user growth and a 12% jump in the number of people who are using search every time they use Instagram.

These gains have come in part from leveraging Unicorn’s ability to rank queries using social features and second-order connections. By indexing every part of the Instagram graph, we powered the ability to search for anything you want - people, places, hashtags, media - faster and more easily as part of the new Search and Explore experience in our 7.0 update. 

What Is Search?

Instagram’s search infrastructure consists of a denormalized store of all entities of interest: hashtags, locations, users and media. In typical search literature these are called documents. Documents are grouped together into sets which can be queried using extremely efficient set operations such as AND, OR and NOT. The results of these operations are efficiently ranked and trimmed to only the most relevant documents for a given query.  When an Instagram user enters a search query, our backend encodes it into set operations and then computes a ranked set of the best results. 

Getting Data In 

Instagram serves millions of requests per second. Many of these, such as signups, likes, and uploads, modify existing records and append new rows to our master  PostgreSQL databases. To maintain the correct set of searchable documents, our search infrastructure needs to be notified of these changes. Furthermore, search typically needs more information than a single row in PostgreSQL — for example, the author’s account vintage is used as a search feature after a photo is uploaded. 

To solve the problem of denormalization, we introduced a system called Slipstream where events on Instagram are encoded into a large Thrift structure containing more information than typical consumers would use. These events are binary-serialized and sent over an asynchronous pub/sub channel we call the Firehose. Consumers, such as search, subscribe to the Firehose, filter out irrelevant events and react to remaining events. The Firehose is implemented on top of Facebook's Scribe which makes the messaging process asynchronous. The figure below shows the  architecture:

image

Since Thrift is schematized, we re-use objects across requests and have consumers consume messages without the need for custom deserializers. A subset of our Slipstream schema, corresponding to a photo like is shown below:

struct User {
1: required i64 id;
2: string username;
3: string fullname;
4: bool is_private;
...
}
struct Media {
1: required i64 id; 
2: required i64 owner_id;
3: required MediaContentType content_type;
...
}
struct LikeEvent {
1: required i64 liker_id;
2: required i64 media_id;
3: required i64 media_owner_id;
4: Media media;
5: User liker;
6: User media_owner;
...
8: bool is_following_media_owner;
}
union InstagramEvent {
...
2: LikeEvent like;
...
}
struct FirehoseEvent {
1: required i64 server_time_millis;
2: required InstagramEvent event;
}

Firehose messages are treated as best-effort and a small percentage of data loss is expected in messaging. We establish eventual consistency in search by a process of reconciliation or a base build. Each night, we scrape a snapshot of all Instagram PostgreSQL databases to Hive for data archiving. Periodically, we query these Hive tables and construct all appropriate documents for each search vertical. The base build is merged against data derived from Slipstream to allow our systems to be eventually consistent even in the event of data loss.

Getting Data Out

Processing Queries

Assuming that we have ingested our data correctly, our search infrastructure enables an efficient path to extracting relevant documents given a constraint. We call this constraint a query,

which is typically a derived form of user-supplied text (e.g. “Justin” with the intent of searching for Justin Bieber). Behind the scenes, queries to Unicorn are rewritten into S-Expressions that express clear intent, for example:

(and
user:maxime
(apply followed_by: followed_by:me)
)

which translates to “people named maxime followed by people I follow”. Our search infrastructure proceeds in two (intermixed) steps:

  • Candidate generation: finding a set of documents that match a given query. Our backend dives into a structure called a reverse index, which finds sets of document ids indexed by a term. For example, we may find the set of users with the name “justin” in the “name:justin” term.
  • Ranking: choosing the best documents from all the candidates. After getting candidate documents, we look up features which encode metadata about a document. For example, one feature for the user justinbieber would be his number of followers (32.3MM). These features are used to compute a “goodness” score, which is used to order the candidates. The “goodness” score can be either machine learned or hand-tuned — in the machine learning case, we may engineer features that discriminate for clicks or follows to a given candidate.

The result of the two steps is an ordered list of the best documents for a given query.

Graph-Aware Searches 

As part of our search improvements, Instagram now takes into account who you follow and who they follow in order to provide a more personalized set of results. This means that it is easier for you to find someone based on the people you follow.

Using Unicorn allowed us to index all the accounts, media, hashtags and places on Instagram and the various relationships between these entities. For example, by indexing a user’s followers, Unicorn can provide answers to questions such as:

“Which accounts does User X follow and are also followed by user Y”

Equally, by indexing the locations tagged in media Unicorn can provide responses for questions such as:

“Media taken in New York City from accounts I follow”

Improving Account Search 

While utilizing the Instagram graph alone may provide signals that improve the search experience, it may not be sufficient to find the account you are looking for. The search ranking infrastructure of Unicorn had to be adapted to work well on Instagram.

One way we did this was to model existing connections within Instagram. On Facebook, the basic relationship between accounts is non-directional (friending is always reciprocal). On Instagram, people can follow each other without having to follow back. Our team had to adapt the search ranking algorithms used to store and retrieve account to Instagram’s follow graph. For Instagram, accounts are retrieved from unicorn by going through different mixes of:

“people followed by people you follow”

and

“People followed by people who follow you”

In addition, on Instagram, people can follow each other for various reasons. It doesn’t necessarily mean that a user has the same amount of interest in all the accounts they follow. Our team built a model to rank the accounts followed by each user. This allows us to prioritize showing people followed by people that are more important to the searcher.

A Unified Search Box

image

Sometimes, the best answer for a search query can be a hashtag or a place. In the previous search experience, Instagram users had to explicitly choose between searching for accounts or hashtags. We made it easier to search for hashtags and places by removing the necessity to select between the different types of results. Instead, we built a ranking framework that allows us to predict which type of results we think the user is looking for. We found in tests that blending hashtags with accounts was such a better experience that clicks on hashtags went up by more than 20%! This increase fortunately didn’t come at the cost of significantly impacting account search.

Our classifiers are both personalized and machine-learned on the logs of searches that users are doing on Instagram. The query logs are aggregated per country to determine if a given search term such as “#tbt” would most likely result in a hashtag search or an account search. Those signals are combined with other signals, such as past searches by a given user and the quality of the results available to show, in order to produce a final blended list of results.

Media Search

Instagram’s search  infrastructure is used to power discovery features far away from user-input search. Our largest search vertical, media, contains the billions of posts on Instagram indexed by the trillions of likes. Unlike our other tiers, media search is purely infrastructure — users never enter any explicit media search queries in the app. Instead, we use it to power features that display media: explore, hashtags, locations and our newly launched editorial clusters.

image

Candidate Generation 

Lacking an explicit query, we get creative with our media reverse index terms to enable slicing along different axes. The table below shows a list of some term types currently supported in our media index:

image

Within each posting list, our media is ordered (“statically ranked”) reverse-chronologically to encourage a strong recency bias for results.  For example, we can serve the Instagram’s profile page for @thomas with a single query: (term owner:181861901). Extending to hashtags, we can serve recent media from #hyperlapse through (term hashtag:#hyperlapse). Composing Unicorn’s operators enable us to find @thomas’ Hyperlapses, by issuing (and hashtag:#hyperlapse owner:181861901).

Many of terms exist to encourage diversity in our search results. For example, we may be interested in making sure that some #hyperlapse candidates are posted by verified accounts.  Through the use of Unicorn’s WEAK AND operator we can guarantee that at least 30% of candidates come from verified accounts:

(wand 
(term hashtag:#hyperlapse)
(term verified:1 :optional-weight 0.3)
)

We exploit diversity to serve better content in the “top” sections of hashtags and locations.

Features 

Although postings lists are ordered chronologically we often want to surface the top media for a given query (hashtag, location, etc.).  After candidate generation, we go through a process of ranking which chooses the best media by assigning a score to each document. The scoring function consumes a list of features and outputs a score representing the “goodness” of a given document for our query.

Features in our index can be divided broadly into three categories:

  • Visual: features that look at the visual content of the image itself. Concretely, we run each of Instagram’s photo through a deep neural net (DNN) image classifier in an attempt to categorize the content of the photo. Afterwards, we perform face detection in order to determine the number and size each of the faces in the photo.
  • Post metadata: features that look at non-visual content of a given post. Many Instagram posts contain captions, location tags, hashtags and/or mentions which aid in determining search relevancy. For example, the FEATURE_IG_MEDIA_IS_LOCATION_TAGGED is an indicator feature determining whether a post contains a location tag.
  • Author: features that look at the person who made a given post. Some of the richest information about a post is determined by the person that made it. For example, FEATURE_IG_MEDIA_AUTHOR_VERIFIED is an indicator feature determining whether the author of a post is verified.

Depending on the use case, we tune features weights differently. On the “top” section of location pages we may wish to differentiate between photos of a location and photos in a location and down-rank photos containing large faces. Instagram uses a per-query-type ranking model that allows for modeling choices appropriate to a particular app view.

Case study: Explore 

Our media search infrastructure also extends itself into discovery, where we serve interesting content that users aren’t explicitly looking for. Instagram’s Explore Posts feature showcases interesting content from people near to you in the Instagram graph. Concretely,  one source of explore candidates “photos liked by people whose photos you have liked”. We can can encode this into a single unicorn query with:

(apply liker:(extract owner: liker:<userid>))

This proceeds inwards-outwards by:

  1. liker:<userid>:  posts that you’ve liked
  2. (extract owner:...):  the owner of those posts
  3. (apply liker:..):  media liked by those owners

After this query generates candidates, we are able to leverage our existing ranking infrastructure to determine the top posts for you. Unlike top posts on hashtag and location pages, the scoring function for explore is machine-learned instead of hand tuned.

image

Acknowledgements

By Maxime Boucher and Thomas Dimson

This project wouldn’t be possible without the contributions of Tom Jackson, Peter DeVries, Weiyi Liu, Lucas Ou-Yang, Felipe Sodre da Silva and Manoli Liodakis

Read the whole story
rafeco
43 days ago
reply
Really nice post on how search works on large scale sites.
Share this story
Delete

More on Hacking Team

1 Comment and 4 Shares

Read this:

Hacking Team asked its customers to shut down operations, but according to one of the leaked files, as part of Hacking Team's "crisis procedure," it could have killed their operations remotely. The company, in fact, has "a backdoor" into every customer's software, giving it ability to suspend it or shut it down­ -- something down­something that even customers aren't told about.

To make matters worse, every copy of Hacking Team's Galileo software is watermarked, according to the source, which means Hacking Team, and now everyone with access to this data dump, can find out who operates it and who they're targeting with it.

It's one thing to have dissatisfied customers. It's another to have dissatisfied customers with death squads. I don't think the company is going to survive this.

Read the whole story
acdha
53 days ago
reply
“It's one thing to have dissatisfied customers. It's another to have dissatisfied customers with death squads.”
Washington, DC
rafeco
48 days ago
reply
Share this story
Delete
Next Page of Stories