Protocol Buffer API Discussion

danielpclark · October 7, 2015, 5:31pm

I have seen several ruby conference talks recommending protocol buffers as the most optimal way to design an efficient scalable API system. Looking into what’s available I was wondering if people had suggestions, recommendations, or tips?

Looking at protobuff and it’s history it looks as though that may have technical debt from its evolution of growth. Also its documentation isn’t very up front about implementation. Beefcake and ruby-protocol-buffers both look good. But I wonder about Beefcake’s support and current activity.

The API framework that looks the best to me is gRPC . It looks to be well documented and built with many supported programming languages. As I will need the client to work on Android and iOS perhaps this is a bonus. But the git repo is in beta and there are over 400+ issues open.

Anyone have experience or anything they’d like to share on Protocol Buffers and implementation?

danielpclark · October 7, 2015, 8:44pm

In installing gRPC there is no stable version. There is a compilation in C required. And the Ruby example segfaults. Also the Google Protocol Buffer git repo has 200ish issues of its own. Not looking good.

If people keep recommending it (Protocol Buffers), then why does there seem to be little information and stability surrounding it (gem/libraries)?

AstonJ · October 7, 2015, 8:46pm

If anyone’s wondering what Protocol buffers are (like I was!):

##What are protocol buffers?

Protocol buffers are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.

danielpclark · October 7, 2015, 9:34pm

Yes. I’ve seen maybe three API conference talks that all recommend Protocol Buffers over the rest ;-).

Just tried the Beefcake example from their git page. It works well. I’ve also been learning quite a bit about writing a custom mime type incoming/response in Rails.

The default mime type seems to be “application/octet-stream”. This can be set up first and then used in the Rails router for both incoming and outgoing. Although the info on how to do this are in bits and pieces all over.

I’m about to dive into some experimentation to get it done. I’m also going to write Minitest tests for the Protocol Buffer interface.

AstonJ · October 7, 2015, 9:52pm

What are the current alternatives Dan?

danielpclark · October 7, 2015, 11:08pm

For APIs or Mime Types?

The current registered mime types in Rails 4.1

Mime::EXTENSION_LOOKUP.map { |m|
  [m[0], m[1].instance_eval {@symbol}, m[1].instance_eval {@string}]
}.each {|i| p i}

["html", :html, "text/html"]
["xhtml", :html, "text/html"]
["text", :text, "text/plain"]
["txt", :text, "text/plain"]
["js", :js, "text/javascript"]
["css", :css, "text/css"]
["ics", :ics, "text/calendar"]
["csv", :csv, "text/csv"]
["vcf", :vcf, "text/vcard"]
["png", :png, "image/png"]
["jpeg", :jpeg, "image/jpeg"]
["jpg", :jpeg, "image/jpeg"]
["jpe", :jpeg, "image/jpeg"]
["pjpeg", :jpeg, "image/jpeg"]
["gif", :gif, "image/gif"]
["bmp", :bmp, "image/bmp"]
["tiff", :tiff, "image/tiff"]
["tif", :tiff, "image/tiff"]
["mpeg", :mpeg, "video/mpeg"]
["mpg", :mpeg, "video/mpeg"]
["mpe", :mpeg, "video/mpeg"]
["xml", :xml, "application/xml"]
["rss", :rss, "application/rss+xml"]
["atom", :atom, "application/atom+xml"]
["yaml", :yaml, "application/x-yaml"]
["multipart_form", :multipart_form, "multipart/form-data"]
["url_encoded_form", :url_encoded_form, "application/x-www-form-urlencoded"]
["json", :json, "application/json"]
["pdf", :pdf, "application/pdf"]
["zip", :zip, "application/zip"]

Although I do know that in the routes.rb file I can do a defaults: {format: :ujs} for UJS AJAX requests.

danielpclark · October 8, 2015, 8:19am

APIs I’m aware of include formats of json, xml, thrift, protocol buffer. Other then that I know one could write their own.

shingara · October 12, 2015, 5:32am

Which conferences please ?

danielpclark · October 12, 2015, 5:13pm

Glad you asked. Now that I’m looking for them they’re hard to find. I’ve seen two or three and I can’t recall what they were exactly. Here’s one that presents Thrift & Protocol Buffers

RuPy 13: What comes after REST / Peter Neumark

The main point of these APIs being efficiency. These APIs, Thrift and Protocol Buffers, are ideal for private consumption, although you may choose to publish them for use. My personal need is for iOS and Android apps and not for public consumption. It needs to be very scalable and so efficiency is key.

danielpclark · October 12, 2015, 5:47pm

I found one! This is very relevant.

Full Stack Fest 2015: Beyond JSON: Improving inter-app communication, by Aaron Quint

shingara · October 14, 2015, 5:54am

Thanks for tout link. I already use message pack instead of json to limit the bandwidth and improve speed without loose to much readability

danielpclark · December 1, 2015, 10:12pm

Google has made a better protocol now called FlatBuffers.

Why use FlatBuffers?

Access to serialized data without parsing/unpacking - What sets FlatBuffers apart is that it represents hierarchical data in a flat binary buffer in such a way that it can still be accessed directly without parsing/unpacking, while also still supporting data structure evolution (forwards/backwards compatibility).
Memory efficiency and speed - The only memory needed to access your data is that of the buffer. It requires 0 additional allocations (in C++, other languages may vary). FlatBuffers is also very suitable for use with mmap (or streaming), requiring only part of the buffer to be in memory. Access is close to the speed of raw struct access with only one extra indirection (a kind of vtable) to allow for format evolution and optional fields. It is aimed at projects where spending time and space (many memory allocations) to be able to access or construct serialized data is undesirable, such as in games or any other performance sensitive applications. See the benchmarks for details.
Flexible - Optional fields means not only do you get great forwards and backwards compatibility (increasingly important for long-lived games: don’t have to update all data with each new version!). It also means you have a lot of choice in what data you write and what data you don’t, and how you design data structures.
Tiny code footprint - Small amounts of generated code, and just a single small header as the minimum dependency, which is very easy to integrate. Again, see the benchmark section for details.
Strongly typed - Errors happen at compile time rather than manually having to write repetitive and error prone run-time checks. Useful code can be generated for you.
Convenient to use - Generated C++ code allows for terse access & construction code. Then there’s optional functionality for parsing schemas and JSON-like text representations at runtime efficiently if needed (faster and more memory efficient than other JSON parsers). Java and Go code supports object-reuse. C# has efficient struct based accessors.
Cross platform code with no dependencies - C++ code will work with any recent gcc/clang and VS2010. Comes with build files for the tests & samples (Android .mk files, and cmake for all other platforms).

Why not use Protocol Buffers, or … ?

Protocol Buffers is indeed relatively similar to FlatBuffers, with the primary difference being that FlatBuffers does not need a parsing/ unpacking step to a secondary representation before you can access data, often coupled with per-object memory allocation. The code is an order of magnitude bigger, too. Protocol Buffers has neither optional text import/export nor schema language features like unions.

But all the cool kids use JSON!

JSON is very readable (which is why we use it as our optional text format) and very convenient when used together with dynamically typed languages (such as JavaScript). When serializing data from statically typed languages, however, JSON not only has the obvious drawback of runtime inefficiency, but also forces you to write more code to access data (counterintuitively) due to its dynamic-typing serialization system. In this context, it is only a better choice for systems that have very little to no information ahead of time about what data needs to be stored.

Read more about the “why” of FlatBuffers in the white paper.