Default Threshold Adjustment for Go Maintainability Checks

In response to feedback from our Go community, we’ve increased the default threshold for the following Go maintainability checks to reflect the language’s convention and to eliminate false positives:

  • File lines
  • Method lines
  • Method complexity

As a result, you might see a one-time, unexpected improvement on the maintainability ratings for your Go repositories. The subsequent analysis should be stabilized after this one-time adjustment.

If you have questions about how our configurations work, please check out our documentation here.

Read more at the source

Swift has arrived!

Swift tweet

We’re thrilled to announce that Swift is the 8th officially supported language on Code Climate!

Just like our other supported languages (Go, TypeScript, JavaScript, PHP, Python, Java, Ruby), we now provide our out-of-box 10-point technical debt assessment, and full support for tracking test coverage of Swift applications. Additionally, we’ve upgraded our Tailor plugin to the latest version.

Please add a Swift repo on your Code Climate account and give it a try! We would love to hear what you think.

Read more at the source

Speeding up Ruby with Shared Strings

It’s not often I am able to write a patch that not only reduces memory usage,
but increases speed as well. Usually I find myself trading memory for speed, so
it’s a real treat when I can improve both in one patch. Today I want to talk
about the patch I submitted to Ruby in this ticket.
It decreases “after boot” memory usage of a Rails application by 4% and speeds
up require by about 35%.

When I was writing this patch, I was actually focusing on trying to reduce
memory usage. It just happens that reducing memory usage also resulted in
faster runtime. So really I wanted to title this post “Reducing Memory Usage in
Ruby”, but I already made a post with that title.

Shared String Optimization

As I mentioned in previous posts, Ruby objects are limited to 40
bytes. But a string can be much longer than 40 bytes, so how are they
stored? If we look at the struct that represents strings, we’ll find there is a char * pointer:

struct RString {
    struct RBasic basic;
    union {
        struct {
            long len;
            char *ptr;
            union {
                long capa;
                VALUE shared;
            } aux;
        } heap;
        char ary[RSTRING_EMBED_LEN_MAX + 1];
    } as;
};

The ptr field in the string struct points to a byte array which is our string.
So the actual memory usage of a string is approximately 40 bytes for the object,
plus however long the string is. If we were to visualize the layout, it would
look something like this:

RString pointing to char array

In this case, there are really two allocations: the RString object and the
“hello world” character array. The RString object is the 40 byte Ruby object
allocated using the GC, and the character array was allocated using the system’s
malloc implementation.

Side note: There is another optimization called “embedding”. Without getting
too far off track, “embedding” is just keeping strings that are “small enough”
stored directly inside the RString structure. We can talk about that in a
different post, but today pretend there are always two distinct allocations.

We can take advantage of this character array and represent substrings by just
pointing at a different location. For example, we can have two Ruby objects,
one representing the string “hello world” and the other representing the string
“world” and only allocate one character array buffer:

RStrings sharing a char array

This example only has 3 allocations: 2 from the GC for the Ruby string objects,
and one malloc for the character array. Using ObjectSpace, we can actually
observe this optimization by measuring memory size of the objects after slicing
them:

>> require 'objspace'
=> true
>> str = "x" * 9000; nil
=> nil
>> ObjectSpace.memsize_of str
=> 9041
>> substr = str[30, str.length - 30]; nil
=> nil
>> str.length
=> 9000
>> substr.length
=> 8970
>> ObjectSpace.memsize_of substr
=> 40

The example above first allocates a string that is 9000 characters. Next we
measure the memory size of the string. The total size is 9000 for the
characters, plus some overhead for the Ruby object for a total of 9041. Next we
take a substring, slicing off the first 30 characters of the original. As
expected, the original string is 9000 characters, and the substring is 8970.
However, if we measure the size of the substring it is only 40 bytes! This is
because the new string only requires a new Ruby object to be allocated, and the
new object just points at a different location in the original string’s
character buffer, just like the graph above showed.

This optimization isn’t limited to just strings, we can use it with arrays too:

>> list = ["x"] * 9000; nil
=> nil
>> ObjectSpace.memsize_of(list)
=> 72040
>> list2 = list[30, list.length - 30]; nil
=> nil
>> ObjectSpace.memsize_of(list2)
=> 40

In fact, functional languages where data structures are immutable can take great
advantage of this optimization. In languages that allow mutations, we have to
deal with the case that the original string might be mutated, where languages
with immutable data structures can be even more aggressive about optimization.

Limits of the Shared String Optimization

This shared string optimization isn’t without limits though. To take advantage
of this optimization, we have to always go to the end of the string. In other
words, we can’t take a slice from the middle of the string and get the
optimization. Lets take our sample string and slice 15 characters off each side
and see what the memsize is:

>> str = "x" * 9000; nil
=> nil
>> str.length
=> 9000
>> substr = str[15, str.length - 30]; nil
=> nil
>> substr.length
=> 8970
>> ObjectSpace.memsize_of(substr)
=> 9011

We can see in the above example that the memsize of the substring is much larger
than in the first example. That is because Ruby had to create a new buffer to
store the substring. So our lesson here is: if you have to slice strings, start
from the left and go all the way to the end.

Here is an interesting thing to think about. At the end of the following
program, what is the memsize of substr? How much memory is this program
actually consuming? Is the str object still alive, and how can we find out?

require 'objspace'

str = "x" * 9000
substr = str[30, str.length - 30]
str = nil
GC.start

# What is the memsize of substr?
# How much memory is this program actually consuming?
# Is `str` still alive even though we did a GC?
# Hint: use `ObjectSpace.dump_all`
# (if you try this out, I recommend running the program with `--disable-gems`)

The optimization I explained above works exactly the same way for strings in C
as it does in Ruby. We will use this optimization to reduce memory usage and
speed up require in Ruby.

Reducing Memory Usage and Speeding Up require

I’ve already described the technique we’re going to use to speed up require,
so lets take a look at the problem. After that, we’ll apply the shared string
optimization to improve performance of require.

Every time a program requires a file, Ruby has to check to see if that file has
already been required. The global variable $LOADED_FEATURES is a list of all
the files that have been required so far. Of course, searching through a list
for a file would be quite slow and get slower as the list grows, so Ruby keeps a
hash to look up entries in the $LOADED_FEATURES list. This hash is called the
loaded_features_index, and it’s stored on the virtual machine structure
here.

The keys of this hash are strings that could be passed to require to require a
particular file, and the value is the index in the $LOADED_FEATURES array of
the file that actually got required. So, for example if you have a file on your
system: /a/b/c.rb, the keys to the hash will be:

  • “/a/b/c.rb”
  • “a/b/c.rb”
  • “b/c.rb”
  • “c.rb”
  • “/a/b/c”
  • “a/b/c”
  • “b/c”
  • “c”

Given a well crafted load path, any of the strings above could be used to load
the /a/b/c.rb file, so the index needs to keep all of them. For example, you
could do ruby -I / -e"require 'a/b/c'", or ruby -I /a -e"require 'b/c'"',
etc, and they all point to the same file.

The loaded_features_index hash is built in the features_index_add
function
.
Lets pick apart this function a little.

static void
features_index_add(VALUE feature, VALUE offset)
{
    VALUE short_feature;
    const char *feature_str, *feature_end, *ext, *p;

    feature_str = StringValuePtr(feature);
    feature_end = feature_str + RSTRING_LEN(feature);

    for (ext = feature_end; ext > feature_str; ext--)
        if (*ext == '.' || *ext == '/')
            break;
    if (*ext != '.')
        ext = NULL;
    /* Now `ext` points to the only string matching %r{^\.[^./]*$} that is
       at the end of `feature`, or is NULL if there is no such string. */

This function takes a feature and an offset as parameters. The feature is
the full name of the file that was required, extension and everything. offset
is the index in the loaded features list where this string is. The first part
of this function starts at the end of the string and scans backwards looking for
a period or a forward slash. If it finds a period, we know the file has an
extension (it is possible to require a Ruby file without an extension!), if it
finds a forward slash, it gives up and assumes there is no extension.

    while (1) {
        long beg;

        p--;
        while (p >= feature_str && *p != '/')
            p--;
        if (p < feature_str)
            break;
        /* Now *p == '/'.  We reach this point for every '/' in `feature`. */
        beg = p + 1 - feature_str;
        short_feature = rb_str_subseq(feature, beg, feature_end - p - 1);
        features_index_add_single(short_feature, offset);
        if (ext) {
            short_feature = rb_str_subseq(feature, beg, ext - p - 1);
            features_index_add_single(short_feature, offset);
        }
    }

Next we scan backwards in the string looking for forward slashes. Every time
it finds a forward slash, it uses rb_str_subseq to get a substring and then
calls features_index_add_single to register that substring. rb_str_subseq
gets substrings in the same way we were doing above in Ruby, and applies the
same optimizations.

The if (ext) conditional deals with files that have an extension, and this is
really where our problems begin. This conditional gets a substring of
feature, but it doesn’t go all the way to the end of the string. It must
exclude the file extension. This means it will copy the underlying string.
So these two calls to rb_str_subseq do 3 allocations total: 2 Ruby objects
(the function returns a Ruby object) and one malloc to copy the string for the
“no extension substring” case.

This function calls features_index_add_single to add the substring to the
index. I want to call out one excerpt from the features_index_add_single
function
:

    features_index = get_loaded_features_index_raw();
    st_lookup(features_index, (st_data_t)short_feature_cstr, (st_data_t *)&this_feature_index);

    if (NIL_P(this_feature_index)) {
        st_insert(features_index, (st_data_t)ruby_strdup(short_feature_cstr), (st_data_t)offset);
    }

This code looks up the string in the index, and if the string isn’t in the
index, it adds it to the index. The caller allocated a new Ruby
string, and that string could get garbage collected, so this function calls
ruby_strdup to copy the string for the hash key. It’s important to note that the
keys to this hash aren’t Ruby objects, but char * pointers that came from
Ruby objects (the char *ptr field that we were looking at earlier).

Lets count the allocations. So far, we have 2 Ruby objects: one with a file
extension and one without, 1 malloc for the non-sharable substring, then 2 more
mallocs to copy the string in to the hash. So each iteration of the while loop
in features_index_add will do 5 allocations: 2 Ruby objects, and 3 mallocs.

In cases like this, a picture might help explain better. Below is a diagram of
the allocated memory and how they relate to each other.

Allocations on Trunk

This diagram shows what the memory layout looks like when adding the path
/a/b/c.rb to the index, resulting in 8 hash entries.

Blue nodes are allocations that were alive before the call to add the path to
the index. Red nodes are intermediate allocations done while populating the
index, and will be freed at some point. Black nodes are allocations made while
adding the path to the index but live after we’ve finished adding the path to
the index. Solid arrows represent actual references, where dotted lines
indicate a relationship but not actually a reference (like one string was
ruby_strdup‘d from another).

The graph has lots of nodes and is very complicated, but we will clean it up!

Applying the Shared String Optimization

I’ve translated the C code to Ruby code so that we can more easily see the
optimization at work:

$features_index = {}

def features_index_add(feature, index)
  ext = feature.index('.')
  p = ext ? ext : feature.length

  loop do
    p -= 1
    while p > 0 && feature[p] != '/'
      p -= 1
    end
    break if p == 0

    short_feature = feature[p + 1, feature.length - p - 1] # New Ruby Object
    features_index_add_single(short_feature, index)

    if ext # slice out the file extension if there is one
      short_feature = feature[p + 1, ext - p - 1] # New Ruby Object + malloc
      features_index_add_single(short_feature, index)
    end
  end
end

def features_index_add_single(str, index)
  return if $features_index.key?(str)

  $features_index[str.dup] = index # malloc
end

features_index_add "/a/b/c.rb", 1

As we already learned, the shared string optimization only works when the
substrings include the end of the shared string. That is, we can only take
substrings from the left side of the string.

The first change we can make is to split the strings in to two cases: one with
an extension, and one without. Since the “no extension” if statement does
not
scan to the end of the string, it always allocates a new string. If we
make a new string that doesn’t contain the extension, then we can eliminate one
of the malloc cases:

$features_index = {}

def features_index_add(feature, index)
  no_ext_feature = nil
  p              = feature.length
  ext            = feature.index('.')

  if ext
    p = ext
    no_ext_feature = feature[0, ext] # New Ruby Object + malloc
  end

  loop do
    p -= 1
    while p > 0 && feature[p] != '/'
      p -= 1
    end
    break if p == 0

    short_feature = feature[p + 1, feature.length - p - 1] # New Ruby Object
    features_index_add_single(short_feature, index)

    if ext
      len = no_ext_feature.length
      short_feature = no_ext_feature[p + 1, len - p - 1] # New Ruby Object
      features_index_add_single(short_feature, index)
    end
  end
end

def features_index_add_single(str, index)
  return if $features_index.key?(str)

  $features_index[str.dup] = index # malloc
end

features_index_add "/a/b/c.rb", 1

This changes the function to allocate one new string, but always scan to the end
of both strings. Now we have two strings that we can use to “scan from the
left”, we’re able to avoid new substring mallocs in the loop. You can see this
change, where I allocate a new string without an extension
here.

Below is a graph of what the memory layout and relationships look like after
pulling up one slice, then sharing the string:

Allocations after shared slice

You can see from this graph that we were able to eliminate string buffers by
allocating the “extensionless” substring first, then taking slices from it.

There are two more optimizations I applied in this patch. Unfortunately they
are specific to the C language and not easy to explain using Ruby.

Eliminating Ruby Object Allocations

The existing code uses Ruby to slice strings. This allocates a new Ruby object.
Now that we have two strings, we can always take substrings from the left, and
that means we can use pointers in C to “create” substrings. Rather than asking
Ruby APIs to slice the string for us, we simply use a pointer in C to point at
where we want the substring to start. The hash table that maintains the index
uses C strings as keys, so instead of passing Ruby objects around, we’ll just
pass a pointer in to the string:

-       short_feature = rb_str_subseq(feature, beg, feature_end - p - 1);
-       features_index_add_single(short_feature, offset);
+       features_index_add_single(feature_str + beg, offset);
        if (ext) {
-           short_feature = rb_str_subseq(feature, beg, ext - p - 1);
-           features_index_add_single(short_feature, offset);
+           features_index_add_single(feature_no_ext_str + beg, offset);
        }
     }
-    features_index_add_single(feature, offset);
+    features_index_add_single(feature_str, offset);
     if (ext) {
-       short_feature = rb_str_subseq(feature, 0, ext - feature_str);
-       features_index_add_single(short_feature, offset);
+       features_index_add_single(feature_no_ext_str, offset);

In this case, using a pointer in to the string simplifies our code.
feature_str is a pointer to the head of the string that has a file
extension, and feature_no_ext_str is a pointer to the head of the string that
doesn’t have a file extension. beg is the number of characters from the
head of the string where we want to slice. All we have to do now is just add
beg to the head of each pointer and pass that to features_index_add_single.

In this graph you can see we no longer need the intermediate Ruby objects
because the “add single” function directly accesses the underlying char *
pointer:

Allocations after pointer substrings

Eliminating malloc Calls

Finally, lets eliminate the ruby_strdup calls. As we covered earlier, new
Ruby strings could get allocated. These Ruby strings would get free’d by the
garbage collector, so we had to call ruby_strdup to keep a copy around inside
the index hash. The feature string passed in is also stored in the
$LOADED_FEATURES global array, so there is no need to copy that string as the
array will prevent the GC from collecting it. However, we created a new string
that does not have an extension, and that object could get collected. If we can
prevent the GC from collecting those strings, then we don’t need to copy
anything.

To keep these new strings alive, I added an array to the virtual machine (the
virtual machine lives for the life of the process):

     vm->loaded_features = rb_ary_new();
     vm->loaded_features_snapshot = rb_ary_tmp_new(0);
     vm->loaded_features_index = st_init_strtable();
+    vm->loaded_features_index_pool = rb_ary_new();

Then I add the new string to the array via rb_ary_push right after allocation:

+       short_feature_no_ext = rb_fstring(rb_str_freeze(rb_str_subseq(feature, 0, ext - feature_str)));
+       feature_no_ext_str = StringValuePtr(short_feature_no_ext);
+       rb_ary_push(get_loaded_features_index_pool_raw(), short_feature_no_ext);

Now all strings in the index hash are shared and kept alive. This means we can
safely remove the ruby_strdup without any strings getting free’d by the GC:

     if (NIL_P(this_feature_index)) {
-       st_insert(features_index, (st_data_t)ruby_strdup(short_feature_cstr), (st_data_t)offset);
+       st_insert(features_index, (st_data_t)short_feature_cstr, (st_data_t)offset);
     }

After this change, we don’t need to copy any memory because the hash keys can
point directly in to the underlying character array inside the Ruby string
object:

Use string indexes for keys

This new algorithm does 2 allocations: one to create a “no extension” copy of
the original string, and one RString object to wrap it. The “loaded features
index pool” array keeps the newly created string from being garbage collected,
and now we can point directly in to the string arrays without needing to copy
the strings.

For any file added to the “loaded features” array, we changed it from requiring
O(N) allocations (where N is the number of slashes in a string) to always
requiring only two allocations regardless of the number of slashes in the
string.

END

By using shared strings I was able to eliminate over 76000 system calls during
the Rails boot process on a basic app, reduce the memory footprint by 4%, and
speed up require by 35%. Next week I will try to get some statistics from a
large application and see how well it performs there!

Thanks for reading!

Read more at the source

Launching Today: Velocity

Data-driven insights to boost your engineering capacity

Today we’re sharing something big: Velocity by Code Climate, our first new product since 2011, is launching in open beta.

Velocity helps organizations increase their engineering capacity by identifying bottlenecks, improving day-to-day developer experience, and coaching teams with data-driven insights, not just anecdotes.

Velocity helps you answer questions like:

  • Which pull requests are high risk and why? (Find out right away, not days later.)
  • How does my team’s KPIs compare to industry averages? Where’s our biggest opportunity to improve?
  • Are our engineering process changes making a difference? (Looking at both quantity and quality of output.)
  • Where do our developers get held up? Do they spend more time waiting on code review or CI results?

Learn more about Velocity

Why launch a new product?

Velocity goes hand-in-hand with our code quality product to help us deliver on our ultimate mission: Superpowers for Engineering Teams. One of our early users noted:

“With Velocity, I’m able to take engineering conversations that previously hinged on gut feel and enrich them with concrete and quantifiable evidence. Now, when decisions are made, we can track their impact on the team based on agreed upon metrics.” – Andrew Fader, VP Engineering, Publicis

Get started today

We’d love to help you level up your engineering organization. Request a free trial and we’ll be in touch right away. As a special thank you for our early supporters, anyone who begins a free, 14-day trial before Friday, February 16th will get 20% off their first year.

Read more at the source

Reducing Memory Usage in Ruby

I’ve been working on building a compacting garbage collector in Ruby for a
while now, and one of the biggest hurdles for implementing a compacting GC is
updating references. For example, if Object A points to Object B, but the
compacting GC moves Object B, how do we make sure that Object A points to the
new location?

Solving this problem has been fairly straight forward for most objects. Ruby’s
garbage collector knows about the internals of most Ruby Objects, so after the
compactor runs, it just walks through all objects and updates their internals
to point at new locations for any moved objects. If the GC doesn’t know
about the internals of some object (for example an Object implemented in a C
extension), it doesn’t allow things referred to by that object to move. For
example, Object A points to Object B. If the GC doesn’t know how to update the
internals of Object A, it won’t allow Object B to move (I call this “pinning”
an object).

Of course, the more objects we allow to move, the better.

Earlier I wrote that updating references for most objects is fairly straight
forward. Unfortunately there has been one thorn in my side for a while, and
that has been Instruction Sequences.

Instruction Sequences

When your Ruby code is compiled, it is turned in to instruction sequence
objects, and those objects are Ruby objects. Typically you don’t interact with
these Ruby objects, but they are there. These objects store byte code for your
Ruby program, any literals in your code, and some other miscellaneous
information about the code that was compiled (source location, coverage info,
etc).

Internally, these instruction sequence objects are referred to as “IMEMO”
objects. There are multiple sub-types of IMEMO objects, and the instruction
sequence sub-type is “iseq”. If you are using Ruby 2.5, and you dump the heap
using ObjectSpace, you’ll see the dump now contains these IMEMO subtypes.
Lets look at an example.

I’ve been using the following code to dump the heap in a Rails application:

require 'objspace'
require 'config/environment'

File.open('output.txt', 'w') do |f|
  ObjectSpace.dump_all(output: f)
end

The above code outputs all objects in memory to a file called “output.txt” in JSON lines format.
Here are a couple IMEMO records from a Rails heap dump:

{
  "address": "0x7fc89d00c400",
  "type": "IMEMO",
  "class": "0x7fc89e95c130",
  "imemo_type": "ment",
  "memsize": 40,
  "flags": {
    "wb_protected": true,
    "old": true,
    "uncollectible": true,
    "marked": true
  }
}
{
  "address": "0x7fc89d00c2e8",
  "type": "IMEMO",
  "imemo_type": "iseq",
  "references": [
    "0x7fc89d00c270",
    "0x7fc89e989a68",
    "0x7fc89e989a68",
    "0x7fc89d00ef48"
  ],
  "memsize": 40,
  "flags": {
    "wb_protected": true,
    "old": true,
    "uncollectible": true,
    "marked": true
  }
}

This example came from Ruby 2.5, so both records contain an imemo_type field.
The first example is a “ment” or “method entry”, and the second example is an
“iseq” or an “instruction sequence”. Today we’ll look at instruction
sequences.

Format of Instruction Sequence

The instruction sequences are the result of compiling our Ruby code. The
instruction sequences are a binary representation of our Ruby code. These
instructions are stored on the instruction sequence object, specifically this
iseq_encoded field
(iseq_size is the length of the iseq_encoded field).

If you were to examine iseq_encoded, you’ll find it’s just a list of numbers.
The list of numbers is virtual machine instructions as well as parameters
(operands) for the instructions.

If we examine the iseq_encoded list, it might look something like this:

  Address Description
0 0x00000001001cddad Instruction (0 operands)
1 0x00000001001cdeee Instruction (2 operands)
2 0x00000001001cdf1e Operand
3 0x000000010184c400 Operand
4 0x00000001001cdeee Instruction (2 operands)
5 0x00000001001c8040 Operand
6 0x0000000100609e40 Operand
7 0x0000000100743d10 Instruction (1 operand)
8 0x00000001001c8040 Operand
9 0x0000000100609e50 Instruction (1 operand)
10 0x0000000100743d38 Operand

Each element of the list corresponds to either an instruction, or the operands
for an instruction. All of the operands for an instruction follow that
instruction in the list. The operands are anything required for executing the
corresponding instruction, including Ruby objects. In other words, some of
these addresses could be addresses for Ruby objects.

Since some of these addresses could be Ruby objects, it means that instruction
sequences reference Ruby objects. But, if instruction sequences reference Ruby
objects, how do the instruction sequences prevent those Ruby objects from
getting garbage collected?

Liveness and Code Compilation

As I said, instruction sequences are the result of compiling your Ruby code.
During compilation, some parts of your code are converted to Ruby objects and
then the addresses for those objects are embedded in the byte code. Lets take
a look at an example of when a Ruby object will be embedded in instruction
sequences, then look at how those objects are kept alive.

Our sample code is just going to be puts "hello world". We can use RubyVM::InstructionSequence to compile the code, then disassemble it. Disassembly decodes iseq_encoded and prints out something more readable.

>> insns = RubyVM::InstructionSequence.compile 'puts "hello world"'
=> <RubyVM::InstructionSequence:<compiled>@<compiled>>
>> puts insns.disasm
== disasm: #<ISeq:<compiled>@<compiled>>================================
0000 trace            1                                               (   1)
0002 putself          
0003 putstring        "hello world"
0005 opt_send_without_block <callinfo!mid:puts, argc:1, FCALL|ARGS_SIMPLE>, <callcache>
0008 leave            
=> nil
>>

Instruction 003 is the putstring instruction. Lets look at the definition
of the putstring instruction which can be found in insns.def:

/* put string val. string will be copied. */
DEFINE_INSN
putstring
(VALUE str)
()
(VALUE val)
{
    val = rb_str_resurrect(str);
}

When the virtual machine executes, it will jump to the location of the
putstring instruction, decode operands, and provide those operands to the
instruction. In this case, the putstring instruction has one operand called
str which is of type VALUE, and one return value called val which is also
of type VALUE. The instruction body itself simply calls rb_str_resurrect,
passing in str, and assigning the return value to val. rb_str_resurrect
just duplicates a Ruby string.
So this instruction takes a Ruby object (a string which has been stored in the
instruction sequences), duplicates that string, then the virtual machines
pushes that duplicated string on to the stack. For a fun exercise, try going
through this process with puts "hello world".freeze and take a look at the
difference.

Now, how does the string “hello world” stay alive until this instruction is
executed? Something must mark the string object so the garbage collector knows
that a reference is being held.

The way the instruction sequences keep these objects alive is through the use
of what it calls a “mark array”. As the compiler converts your code in to
instruction sequences, it will allocate a string for “hello world”, then push
that string on to an array. Here is an excerpt from compile.c that does this:

case TS_VALUE:    /* VALUE */
{
    VALUE v = operands[j];
    generated_iseq[code_index + 1 + j] = v;
    /* to mark ruby object */
    iseq_add_mark_object(iseq, v);
    break;
}

All iseq_add_mark_object does is push the VALUE on to an array which is
stored on the instruction sequence object. iseq is the instruction sequence
object, and v is the VALUE we want to keep alive (in this case the string
“hello world”). If you look in vm_core.h, you can find the location of that
mark array
with a comment that says:

VALUE mark_ary;     /* Array: includes operands which should be GC marked */

Instruction Sequence References and Compaction

So, instruction sequences contain two references to a string literal: one in
the instructions in iseq_encoded, and one via the mark array. If the string
literal moves, then both locations will need to be updated. Updating array
internals is fairly trivial: it’s just a list. Updating instruction sequences
on the other hand is not so easy.

To update references in the instruction sequences, we have to disassemble the
instructions, locate any VALUE operands, and update those locations. There
wasn’t any code to walk these instructions, so I introduced a
function that would disassemble instructions and call a function pointer with
those objects
.
This allows us to find new locations of Ruby objects and update the
instructions. But what if we could use this function for something more?

Reducing Memory

Now we’re finally on to the part about saving memory. The point of the mark
arrays stored on the instruction sequence objects is to keep any objects
referred to by instruction sequences alive:

ISeq and Array marking paths

We can reuse the “update reference” function to mark references contained
directly in instruction sequences. This means we can reduce the size of the
mark array:

Mark Literals via disasm

Completely eliminating the mark array is a different story as there are things
stored in the mark array that aren’t just literals. However, if we directly
mark objects from the instruction sequences, then we rarely have to grow the
array. The amount of memory we save is the size of the array plus any unused
extra capacity in the array
.

I’ve made a patch that implements this strategy, and you can find it on the
GitHub fork of Ruby.

I found that this saves approximately 3% memory on a basic Rails application
set to production mode. Of course, the more code you load, the more memory you
save. I expected the patch to impact GC performance because disassembling
instructions and iterating through them should be harder than just iterating an
array. However, since instruction sequences get old, and we have a
generational garbage collector, the impact to real apps is very small.

I’m working to upstream this patch to Ruby, and you can follow along and read
more information about the analysis here.

Anyway, I hope you found this blurgh post informative, and please have a good
day!

<3 <3 <3

I want to give a huge thanks to Allison McMillan.
Every week she’s been helping me figure out what is going on with this complex code.
I definitely recommend that you follow her on Twitter.

Read more at the source

New: Golang is here!

We’re thrilled to announce that Go is the 7th officially supported language on Code Climate!

Just like our other supported languages (Typescript, Javascript, PhP, Python, Java and Ruby), we now provide our out-of-box 10-point technical debt assessment , and full support for tracking test coverage of Go applications. Additionally, we’ve promoted 3 Go plugins – gofmt, golint, and govet out of beta, and our team will support and keep them updated going forward.

Please add a Go repo on your Code Climate account and give it a try! We would love to hear what you think.

Read more at the source

How Codecademy achieves rapid growth and maintainable code

We sat down with Jake Hiller, Head of Engineering at Codecademy, to find out how they use Code Climate to maintain their quality standards while rapidly growing their engineering team.

Codecademy Logo

Industry
Education
Employees
50+
Developers
20+
Location
Manhattan, NY
Languages
Ruby, JavaScript, SCSS
Customer
Since May 2013

Code Climate keeps our process for creating PRs really low-effort so we can quickly test ideas and ship sooner.

Why Code Climate

Like many rapidly developing teams, Codecademy was running into growing pains for both engineering onboarding and code review. They had tried using local analysis tools but found them cumbersome to integrate as development environments varied across the team.

With an engineering workflow centered around pull request reviews, and a desire to reduce friction in committing and testing code, they needed a solution that would optimize their pull request review process and enable new team members to quickly become productive.

Codecademy had been using Code Climate for their Ruby stack since 2013. When Head of Engineering, Jake Hiller, joined in early 2015, he saw an opportunity to alleviate their code review and onboarding issues by rolling it out to the whole team.

“We wanted to avoid anything that blocks engineers from committing and testing code. Other solutions that use pre-commit hooks are invasive to both experimentation and the creative process. Code Climate’s flexibility helps us maintain rules that are tailored to our team and codebase, while offering standard maintainability measurements. Plus it enables us to defer checks until code is ready to be reviewed, so we can quickly test ideas and ship sooner.”

“Code Climate helps us transfer knowledge to new engineers – like our coding standards, why we’ve made decisions over time, and why we’ve chosen certain structures and patterns.

Increased speed and quality

Since rolling out to the whole team, Hiller says Codecademy has seen an improvement in the quality of their code reviews and the ease with which new team members get up to speed.

“Code Climate helps us transfer knowledge to new engineers – like our coding standards, why we’ve made decisions over time, and why we’ve chosen certain structures and patterns. New engineers can look through the Code Climate issues in their PR, ask questions, and propose changes and suggestions to the team.

“It’s also increased the speed and quality of our pull request reviews. We’ve been able to spend more time discussing the important functional aspects of our code, and less time debating smaller issues. There are a lot of issues that can’t be fixed with an auto formatter, which is where Code Climate will always be really helpful for our team.”

About Codecademy

Codecademy was founded in 2011 as an immersive online platform for learning to code in a fun, interactive, and accessible way. They’ve helped 45 million people learn how to code, covering a wide variety of programming languages, frameworks, and larger topics like Data Analysis and Web Development. Their recently released Pro and Pro Intensive products provide users with more hands on support and practice material to help them learn the skills they need to find jobs.

Read more at the source
close