Keane Nguyen

Aspiring hammerhead shark | GitHub

June 19, 2023

Using Version 5 UUIDs for Idempotency


I recently discovered version 5 UUIDs and have been using them all over the place whenever I find that I want deterministically-generated ids. Version 5 allows me to hash a string and get a UUID out, which I can provide to Postgres as the primary key of some row that I’m inserting. This means I can make code, like scripts which generate and insert data into the database, idempotent by simply relying on the primary key’s unique constraint.

For example, let’s say I have a CSV of email addresses and I’m writing a script to insert new rows into a users table from that CSV. The users table has an email column, but it doesn’t have a unique constraint on it, for some extraneous reason related to the business requirements of the rest of the application. In that script, I’ll pass ${scriptName}-${csvRow.email} as the name argument to the version 5 UUID function, and then use the resulting UUID as the primary key of each row that I insert. Now the script is automatically idempotent; if I get a new CSV, or my existing one got updated, and I run the script again, duplicate user rows won’t be made.

Another use-case I’ve found is for setting up test data. I prefer writing tests that actually make queries to a local database rather than mocking the database layer. When inserting rows into the database to setup for tests, I can easily assign foreign keys by using the same hash input instead of copy-pasting hard-coded UUIDs or having to look them up. And as an extra bonus, my test code becomes more assertive because I get what is effectively a virtual unique constraint.

In one of the projects I’m working on right now, every test works with completely different rows than all the other tests; each test inserts its own rows, and the primary keys for those rows are version 5 UUIDs where the test name itself is part of the hash input. Because tests don’t share data, I don’t do any teardown or resetting of the database state between tests. And after tests finish running, I can open a SQL client to inspect the database state resulting from a particular test. Of course, this won’t work for every project, depending on the kinds of queries being used.

I only recently started using version 5 UUIDs this way, so there may be flaws in these approaches that I simply haven’t encountered yet. I’ll keep at it for the time being, and if the day comes that I find good reasons to stop, then you can expect a new entry here.

June 7, 2023

Bad JSON Parser


I realized earlier today that writing even a bad, noncompliant JSON parser is more difficult than I thought. I spent a couple hours and came up with this, which mostly works for valid input:

function* tokenizer(input) {
  let isString = false;
  let stringOrKey = undefined; // this is always a string or object key; it can never be any other JSON type
  let value = "";
  for (const c of input) {
    const primitiveToken = {
      type: "PRIMITIVE",
      value:
        value === "null"
          ? null
          : value === "false"
          ? false
          : value === "true"
          ? true
          : parseFloat(value),
    };
    if (isString) {
      if (c === '"') {
        stringOrKey = value;
        isString = false;
        value = "";
      } else {
        value += c;
      }
    } else {
      if (c === '"') {
        isString = true;
      } else if (c === "{") {
        yield { type: "OBJECT_START" };
      } else if (c === "}") {
        if (stringOrKey !== undefined) {
          yield { type: "PRIMITIVE", value: stringOrKey };
          stringOrKey = undefined;
        }
        if (value.length) {
          yield primitiveToken;
          value = "";
        }
        yield { type: "OBJECT_END" };
      } else if (c === "[") {
        yield { type: "ARRAY_START" };
      } else if (c === "]") {
        if (stringOrKey !== undefined) {
          yield { type: "PRIMITIVE", value: stringOrKey };
          stringOrKey = undefined;
        }
        if (value.length) {
          yield primitiveToken;
          value = "";
        }
        yield { type: "ARRAY_END" };
      } else if (c === ",") {
        if (stringOrKey !== undefined) {
          yield { type: "PRIMITIVE", value: stringOrKey };
          stringOrKey = undefined;
        }
        if (value.length) {
          yield primitiveToken;
          value = "";
        }
      } else if (c === ":") {
        if (stringOrKey !== undefined) {
          yield { type: "KEY", value: stringOrKey };
          stringOrKey = undefined;
        }
      } else if (c.match(/\s/)) {
        if (value.length) {
          yield primitiveToken;
          value = "";
        } else {
          // Do nothing, ignore this whitespace
        }
      } else {
        value += c;
      }
    }
  }
  if (stringOrKey !== undefined) {
    yield { type: "PRIMITIVE", value: stringOrKey };
  }
  if (value.length) {
    yield {
      type: "PRIMITIVE",
      value:
        value === "null"
          ? null
          : value === "false"
          ? false
          : value === "true"
          ? true
          : parseFloat(value),
    };
  }
}

function parse(input) {
  const iter = tokenizer(input);
  const ARRAY_END = Symbol("ARRAY_END");
  function parseToken(iter) {
    const { done, value: token } = iter.next();
    if (done) {
      return undefined;
    }
    if (token.type === "PRIMITIVE") {
      return token.value;
    } else if (token.type === "OBJECT_START") {
      const ret = {};
      let next = iter.next();
      while (next.value.type === "KEY") {
        ret[next.value.value] = parseToken(iter);
        next = iter.next();
      }
      console.assert(next.value.type === "OBJECT_END");
      return ret;
    } else if (token.type === "ARRAY_START") {
      const ret = [];
      let token = parseToken(iter);
      while (token !== ARRAY_END) {
        ret.push(token);
        token = parseToken(iter);
      }
      return ret;
    }
    return ARRAY_END;
  }
  return parseToken(iter);
}

The most interesting aspect is that the tokenizer is iterative, whereas the parser is recursive. As far as I am aware, its main limitations are its poor error handling and lack of support for escape codes within strings.

June 6, 2023

Generating a Schema for a 5 GB JSON File in 45s


Earlier this year, as part of a database migration from Firebase RTDB to Postgres, I was faced with figuring out the schema of a 5 GB JSON file. The file was large due to the JSON’s width, rather than depth; it contained some objects which had more than 100,000 keys, and several large arrays. Those large objects had dynamically-generated keys (such as uuidv4s) which would be ids mapping to other entities. Since the RTDB instance had been in use for over two years as business requirements and the schema evolved, many of the fields within certain entities were long forgotten.

Here’s an example:

{
    "users": {
        "[some uuid]": {
            "createdAt": "really old",
            "fieldWeDontUseAnymore": "some value",
            ...
        },
        "[some uuid]": {
            "createdAt": "less old",
            "fieldWeActuallyUse": "some value",
            ...
        },
	...
    }
}

We wanted to know what possible values and types could be at a given path within the JSON. However, all of the tools I found for generating JSON schemas from JSON files assumed smaller file sizes, and thus exhaustively included all keys in all objects, and all elements of all arrays, in the resulting schema. I was not interested in receiving a schema where every uuid in the users object was listed out.

Additionally, most of these tools and libraries worked by first parsing the entire JSON file in memory, followed by generating the schema afterward. Due to the extra overhead of data structures in Python and JavaScript compared to the raw JSON string, attempting to parse the file in those runtimes ballooned memory usage beyond what was available on my laptop, leading to stalling or crashing.

I thus decided to write my own schema generator that would iterate over the input JSON file in a streaming manner, generating a schema on the fly. Upon determining that an object or array meets a size threshold as the program landed upon the nth object key or array element, it would collapse that object or array’s schema, combining the schemas of the children recursively.

Say, for example, that our input JSON file is the earlier example with the users object containing uuid keys. My program would first see the open brace, and determine that, so far, the schema is:

{}

It would then encounter the key "users", and would thus update the schema to be:

{
    users: unknown;
}

It would then see another open brace. After processing several user objects, the schema would match exactly a subset of the full users object, and look something like this (written as a TypeScript type):

{
    users: {
        uuid0: {
            createdAt: "really old";
            fieldWeDontUseAnymore: "some value";
            nestedField: {
                key: "value";
            }
        },
        uuid1: {
            createdAt: "less old";
            fieldWeActuallyUse: "some value";
            nestedField: {
                key: "different value";
                anotherKey: "another value";
            }
        }
    }
}

See how the TypeScript type looks exactly the same as the JSON? That’s because it’s as if you used const for that object in TypeScript; the schema/type matches the object exactly—for now.

Since the users object is large, we haven’t yet seen the closing brace for the users object, and so we carry on adding more and more to the schema. Upon reaching the 51st new key under the users object, the program would realize that the users object likely contains dynamically-generated keys, rather than a fixed set of human-readable keys, such as those within an individual user object inside the giant users object. The program would subsequently collapse the subschemas within the large users object schema by merging all of the schemas of each individual users object that has been seen. As it merges the schemas of the object under the uuid0 key and the object under the uuid1 key, it would also have to recurse and merge the schemas under the nestedField keys, since they map to non-primitive values. The result would look like this:

{
    users: {
        [key: string]: {
            createdAt: "really old" | "less old";
            fieldWeDontUseAnymore?: "some value";
            fieldWeActuallyUse?: "some value";
            nestedField: {
                key: "value" | "different value";
                anotherKey?: "another value";
            }
        }
    }
}

Now, as it continues processing the rest of the user objects, it would continuously merge each user object’s schema into the wildcard schema. As part of recursively merging subschemas, if too many possibilities of literal values for a certain field are reached, that field’s type would go from a set of literal values to a generic type, such as string. So eventually, after processing every user object inside "users", the schema might look something like this :

{
    users: {
        [key: string]: {
            createdAt: string;
            fieldWeDontUseAnymore?: string;
            fieldWeActuallyUse?: string;
            nestedField: {
                key: string;
                anotherKey?: string | number;
                yetAnotherKey?: "enum choice 1" | "enum choice 2";
            }
        }
    }
}

Collapsing the schema keeps it readable and also ensures that memory usage only scales linearly with the depth of the JSON structure, rather than its width. This is because the widest object or array schema held in memory would have at most 50 keys or elements.

I wrote my first implementation using the bfj npm package. Unfortunately, although the bfj README mentions that it uses Bluebird promises to avoid a memory leak with the stdlib Promises, I still saw a memory leak when using the library. On my laptop my script was consuming upwards of 20 GB of RAM after running for 20 minutes. A look at a profiler showed that most of the memory in use was from closures coming from the Bluebird promises.

I decided to take control over the memory usage of the program by translating my algorithm into Rust. cargo run took 30 minutes before spitting out a schema. I was already celebrating that it worked at all before I realized that I had built with the dev profile. Using the release flag, it ran in 45 seconds and never used more than 50 MB of memory. For the first time, my rudimentary knowledge of Rust had paid off!

Side note: I tried and failed to get ChatGPT 3.5 to write the program for me. It kept trying to parse the entire file at once. I concede that my inexperience with prompting may have been at fault.

The output revealed that our user objects had more than 90 possible fields, of which only 40 were known before running my schema generator. And every single one of those 90 fields was optional, meaning that none of them were present in every user object. Granted, there were many thousands of user objects created over many months, but I would have expected at least one common field like createdAt or something.

Our live code definitely expected some of those user object fields to always exist, so if certain users with ancient user objects tried to use the site, they would have encountered runtime errors from null pointers and other problems. No wonder people care about data integrity!

June 5, 2023

Mocking Constructors With Jest


Mocks are probably the most difficult feature of Jest to use and understand. It doesn’t help that the Jest documentation on mocking constructors makes use of a relatively obscure JavaScript feature.

See this example from the official Jest docs:

import SoundPlayer from './sound-player';
const mockPlaySoundFile = jest.fn();
jest.mock('./sound-player', () => {
  return jest.fn().mockImplementation(() => {
    return {playSoundFile: mockPlaySoundFile};
  });
});

Which mocks this class:

export default class SoundPlayer {
  constructor() {
    this.foo = 'bar';
  }

  playSoundFile(fileName) {
    console.log('Playing sound file ' + fileName);
  }
}

Note that the mock implementation is mocking a constructor function. The function passed to mockImplementation is an arrow function, but arrow functions can’t be used as constructors. How come the mock works when used as a constructor? The answer is that the arrow function passed to mockImplementation doesn’t become the entirety of the resulting mocked function. The final mocked function is a combination of the functionality that comes with jest.fn() as well as the custom code in the arrow function passed to mockImplementation. And the final mocked function returned from mockImplementation (and made available to the code under test) is a function function, which can be used with the new keyword.

But the example is not just a little confusing; it comes with a major pitfall that has tripped up several poor Jest users, including me (#2982, #8431, #10965, #11316). If you try to read the instances property of the mocked function, it won’t work as expected!

test("get mocked instance", () => {
  const myInstance = new SoundPlayer();
  expect(myInstance).toBe(SoundPlayer.mock.instances[0]);
});

That test fails:

    expect(received).toBe(expected) // Object.is equality

    - Expected  - 1
    + Received  + 3

    - mockConstructor {}
    + Object {
    +   "playSoundFile": [Function mockConstructor],
    + }

To fully understand why that is, we need to know a little about the new keyword in JavaScript. MDN explains what happens when you invoke a function with new:

When a function is called with the new keyword, the function will be used as a constructor. new will do the following things:

  1. Creates a blank, plain JavaScript object. For convenience, let’s call it newInstance.
  2. Points newInstance’s [[Prototype]] to the constructor function’s prototype property, if the prototype is an Object. Otherwise, newInstance stays as a plain object with Object.prototype as its [[Prototype]]. Note: Properties/objects added to the constructor function’s prototype property are therefore accessible to all instances created from the constructor function.
  3. Executes the constructor function with the given arguments, binding newInstance as the this context (i.e. all references to this in the constructor function now refer to newInstance).
  4. If the constructor function returns a non-primitive, this return value becomes the result of the whole new expression. Otherwise, if the constructor function doesn’t return anything or returns a primitive, newInstance is returned instead. (Normally constructors don’t return a value, but they can choose to do so to override the normal object creation process.)

Notice how the mockImplementation argument returns an object: return {playSoundFile: mockPlaySoundFile}. That object will, of course, be returned by the mocked function. And that mocked function is being called with new. That means the this value created in step 1, newInstance, is not what’s being returned. The tricky part is that Jest stores newInstance into instances, not whatever you return from your constructor! You can think of the internals of jest.fn() as looking like this:

/**
 * Create a mock function. A partial simulation of `jest.mock`.
 */
function fn() {
  // This will be the `mock` property of the returned function
  const mock = {
    // ... other fields such as .calls, .results, .contexts, etc.
    instances: [],
  };

  // This is the mocked function that we will return
  function mockConstructor() {
    if (new.target) {
      // Note how the customImplementation return value is not
      // used for the `instances` array!
      mock.instances.push(this);
    }
    // ... other logic to append to the other fields in `mock`
    if (mockConstructor._customImplementation) {
      return mockConstructor._customImplementation.call(this);
    }
  }
  mockConstructor.mock = mock;
  mockConstructor.mockImplementation = (implementation) => {
    mockConstructor._customImplementation = implementation;

    // Return the same function that `fn()` returns to allow
    // for chaining
    return mockConstructor;
  };
  return mockConstructor;
}

Here are some examples to demonstrate the consequences of using an arrow function and returning an object instead of modifying this:

// No custom implementation; default mock function.
const mocked0 = fn();
console.log(new mocked0(), mocked0.mock.instances);
// mockConstructor {} [ mockConstructor {} ]
// The two are the same.

const mocked1 = fn().mockImplementation(
  // Custom implementation: arrow function returning
  // non-primitive value
  () => {
    this.c = "d"; // here, `this` refers to the NodeJS module we’re in
    return {
      a: "b",
    };
  }
);
console.log(new mocked1(), mocked1.mock.instances);
// { a: 'b' } [ mockConstructor {} ]
// The two are not the same. The instance in the `instances`
// array remains unmodified by the line `this.c = "d"` because
// the arrow function is unable to access the correct `this` value.

const mocked2 = fn().mockImplementation(
  // Custom implementation: arrow function returning undefined
  () => {
    this.c = "d"; // here, `this` refers to the NodeJS module we’re in
  }
);
console.log(new mocked2(), mocked2.mock.instances);
// mockConstructor {} [ mockConstructor {} ]
// The two are the same, but unmodified by the line `this.c = "d"`
// because the arrow function is unable to access the correct
// `this` value.

const mocked3 = fn().mockImplementation(
  // Custom implementation: `function` function returning
  // non-primitive value
  function () {
    this.c = "d";
    return {
      a: "b",
    };
  }
);
console.log(new mocked3(), mocked3.mock.instances);
// { a: 'b' } [ mockConstructor { c: 'd' } ]
// The two are not the same. The instance in the `instances`
// array was successfully modified by the line `this.c = "d"`.

const mocked4 = fn().mockImplementation(
  // Custom implementation: `function` function returning undefined
  function () {
    this.c = "d";
  }
);
console.log(new mocked4(), mocked4.mock.instances);
// mockConstructor { c: 'd' } [ mockConstructor { c: 'd' } ]
// The two are the same, and successfully modified by the
// line `this.c = "d"`.

console.log(this);
// { c: 'd' }
// When the arrow functions modified `this`, they were
// actually changing the entire module's `this` rather
// than the mock SoundPlayer instance being constructed.

For all of these examples, replacing fn() with jest.fn() will result in the same output.

My take is that, to avoid tripping up users, the example in the Jest documentation should probably look like this:

import SoundPlayer from './sound-player';
const mockPlaySoundFile = jest.fn();
jest.mock('./sound-player', () => {
  return jest.fn().mockImplementation(function () {
    this.playSoundFile = mockPlaySoundFile;
  });
});

And that would suffice to pass the earlier "get mocked instance" test.

February 24, 2023

Determine Whether a Linked List Contains a Cycle


The common solution is to use two pointers, one fast and one slow. I recently remembered with amusement that in October of 2018, I came up with and submitted to LeetCode a different algorithm:

  1. Hold on to the head of the linked list.
  2. Traverse the linked list, reversing it as you go.
  3. Eventually the current node will have no next node. Compare the current node with the head node. If they are one and the same, then the linked list contains a cycle.
  4. If the linked list needs to be restored to its original state, reverse the linked list again.
January 30, 2023

CORS: What and Why


Even before JavaScript existed, browsers had enabled web developers to create interactive, stateful websites thanks to browser cookies. When a webpage stored a cookie in the browser, the cookie would be automatically attached to subsequent requests to the cookie’s origin. Those requests were the result of page navigations by the user or HTML form submissions.

Cookies are often used for authentication; in the process of logging in, a request is made from the browser to the server, and the server sets a cookie when it responds. When the browser makes additional requests to the server later on, which all have the cookie attached, the server uses the presence of the cookie to determine that the user is currently logged in.

The browser’s behavior of automatically attaching cookies based on the destination of the request can be abused by a kind of attack called Cross-Site Request Forgery (CSRF). CSRF attacks are under the broader category of confused deputy attacks. With CSRF, one site makes requests to another site on behalf of the user and without their consent, analogous to impersonating someone at a bank and withdrawing their money.

For example, suppose there is a malicious webpage served from shadywebsitedomain.xyz. The page has an HTML form on it, with the action set to Facebook’s domain. The page is styled in such a way so as to confuse visitors into clicking the form’s submit button. If the user is already logged into Facebook at the time that they click submit, then the browser will have a cookie from facebook.com, and it will attach it to the form’s submission request because the request’s destination is facebook.com. Since the request has the cookie in it, the Facebook server receiving the request would take an authenticated action on behalf of the user.

Today, CSRF attacks are prevented by countermeasures such as web servers requiring not only the usual auth cookie, but an additional CSRF token to be included in any important form submission requests. Although CSRF attacks through HTML form elements have been around for a long time, not all web frameworks include CSRF mitigations by default. For example, the Remix framework boasts its ability to build web pages that work even without JavaScript through liberal use of HTML forms. But if you’re using cookies for authentication, you should note that Remix currently does not come with CSRF protections.

Eventually, JavaScript was added to web browsers, and along with it came the ability to initiate HTTP requests from within JavaScript code. JavaScript made it so that browser users can unwittingly run arbitrary code on their machine by merely visiting a web page. You are probably aware that today, many websites on the internet contain JavaScript code that instructs your browser to do various things you may not want it to be doing, such as tracking you or mining cryptocurrencies—or perhaps, even attempting a CSRF attack on your bank’s website.

Imagine if your browser automatically attached auth cookies to any and all requests made from JavaScript. That would allow anyone to set up a website, say www.nefarious.bad, which contains JavaScript that sends a request to api.yourbank.com to request your account number. Immediately upon visiting that URL, your bank account number would be stolen. That hypothetical CSRF attack would have been trivial to pull off, so the people making web browsers introduced Same-Origin Policy (SOP) to prevent it.

Same-Origin Policy is a set of rules enforced by the browser that, generally speaking, disallows a page at a given origin A from reading any data loaded from a different origin B. The SOP rules affect various web APIs including DOM access, but I will focus on how they restrict JavaScript-initiated cross-origin requests. A cross-origin request is one in which the webpage making the request is served from an origin that does not match the request’s destination origin.

As we saw earlier, CSRF attacks were already possible before the introduction of JavaScript; those attacks would be done through GET requests, or POST requests from HTML forms. Browser vendors were unwilling to start blocking cross-origin HTML form posts, since that would break backward compatibility. So even if, hypothetically, SOP prevented JavaScript from making cross-origin POST requests that looked like HTML form requests (Content-Type: application/x-www-form-urlencoded), bad actors could still use the HTML form CSRF attack I described earlier to achieve the exact same result.

Therefore, it was decided that the new SOP rules would not affect the particular kinds of requests that were already possible before JavaScript, and thus already vulnerable to CSRF. Instead, they would restrict the new capabilities brought in by JavaScript in order to avoid introducing any new CSRF attack vectors. Those requests that were possible before JavaScript would be called “simple” requests, and they included most GET requests, most HEAD requests, and the subset of POST requests that can be sent from an HTML form. You can find a more detailed description of “simple” requests on the MDN page for CORS.

So how would SOP restrict JavaScript requests? As an example, with JavaScript, websites could now make DELETE requests. SOP made it so browsers would refuse to send the DELETE request in the first place if the request was cross-origin. This was the case for all non-“simple” (sometimes called “complex”) cross-origin requests. For “simple” requests, the request would be still be sent to the server, but the page would not be allowed to read the response.

Stated succinctly, SOP rules dictated that

  1. “Simple” cross-origin requests would be sent to the server, but the page would not be allowed to read the response.
  2. All other cross-origin requests would be blocked from being sent in the first place.

These restrictions meant that developers building pages which needed to make requests from JavaScript to other origins, such as a single-page application at www.mywebsite.com making an XMLHttpRequest to api.mywebsite.com, would have to use inconvenient workarounds like proxying the latter origin through the former.

Years later, as single-page applications became more and more popular, browsers added Cross-Origin Resource Sharing (CORS) to address some of those inconveniences. As the name suggests, CORS allows web developers building server endpoints to relax the SOP restrictions on a per-request basis. With CORS, servers could tell web browsers what the JavaScript running on the page would be allowed to do via headers in the response. For the “simple” cross-origin requests, if the server responds using certain headers, then the browser would permit the page’s JavaScript to read the response. For all other cross-origin requests, rather than immediately blocking the request, the browser first sends an extra “preflight” request to ask the server if it will allow the original request. If the server responds and gives permission via the headers, the browser will then send the original request and allow the JavaScript to read the response.

Finally, after CORS, the SameSite cookie attribute was added to browsers to tackle some of the same security issues that SOP originally addressed, but by preventing cookies from being attached to requests in the first place, rather than blocking requests entirely. For two great explanations of how the attribute works, see Rowan Merewood’s article SameSite Cookies Explained and Julien Cretel’s blog post The great SameSite confusion.

I had to piece it together from many different sources, but reading about this history behind CORS has given me a narrative that helps me remember what it actually is and why it exists. If you see anything that needs correcting, please submit an issue on GitHub.

November 21, 2022

How to Use the EdgeDB Rust Client


For my latest small project with Rust, which I’m working on mainly to get better acquainted with the language, I chose to use EdgeDB, a new database that I’ve had my eye on for several months. Unfortunately, I’ve found that the EdgeDB Rust client is not particularly well documented. It seems that much more effort was put into documenting the TypeScript client with guides for common use cases, and the Rust client could certainly use some love in that regard.

I will briefly describe how to write simple queries using the Rust client so you can avoid the trial and error that I went through.

Suppose your schema looks like this:

  type User {
    required property username -> str {
      constraint exclusive;
      constraint min_len_value(1);
    }

    property password -> str;
  }

Here’s what an insert could look like:

let result = conn
        .query_required_single_json(
            "insert User {
        username := <str>$0,
        password := <str>$1
    };",
            &("someone@example.com", "hunter2"),
        )
        .await;

The EdgeQL cast <str> is necessary in front of each parameter in the query to avoid getting a runtime error. I didn’t see this documented anywhere. The arguments in the query must have a cast that matches the Rust type being passed in. I don’t believe the documentation contains the mapping between Rust types and EdgeQL types either; I mostly used guessing and checking to get my queries to work.

If you have an enum in your schema, you can insert it by passing a &str with the enum’s exact name.

Suppose your schema looks like this:

scalar type MyEnum extending enum<ChoiceA, ChoiceB>;

type ContainsEnum {
  required property e -> MyEnum;
} 

Here’s what an insert could look like:

let result = conn
    .query_required_single_json(
        "insert ContainsEnum {
            e := <MyEnum><str>$0
        };",
        &(
            "ChoiceB",
        ),
    )
    .await;

On 4th line, the <MyEnum> cast is not required in <MyEnum><str>$0 due to implicit casting. It’s only there for clarity.

Being a complete beginner, I wasn’t able to figure out a way to use the data returned by the non-json functions, like query_required_single. Instead, I used the json version of each query function and used serde_json::from_str to get the values I needed out of the result.

November 9, 2022

Reflection


I noticed this week that it’s been around a year since I last wrote here. The first thing that came to mind was, “How have I progressed as a software engineer since then?” One of my main concerns these days is the fear that I spend most of my time doing work that doesn’t challenge or push me. Whether that fear is justified or not, I’ll use this opportunity to step back and reflect upon the ways I’ve expanded specifically my technical abilities as an engineer.

Since my last post, I knocked out several long-time tasks on my todo list. In the spring, I learned the basics of Elixir, the Phoenix framework, and Ecto, the latter of which I found elegant but not mind-blowing. Over the summer, I put Phoenix Channels to the test with benchmarks on a basic chat service. I also finally went through Leslie Lamport’s TLA+ video course, but have yet to make practical use of it in any scenario. And just over the last two weeks, I began writing a small amount of Rust. The Rust learning curve has been more like a wall than a curve, but the shocking amount of reading required to write a simple HTTP endpoint using Actix Web actually excited more than discouraged me. Rust strikes me as a sophisticated language that has much to teach me, and I look forward to struggling with it some more.

Regarding languages I already had experience with: I started writing a little bit of Python again for work, but nowadays find that I don’t like anything about it anymore. Thankfully, the majority of my time with computer languages was instead spent reading and writing JavaScript and TypeScript. Despite that, I don’t think I learned anything new whatsoever about JavaScript. That suggests that I know more than enough about the language to do the kind of straightforward, everyday web development that is needed from me at my day job. However, I did learn a number of ways one can set up a TypeScript monorepo. Configuring a TypeScript project sort of reminds me of setting up Webpack back in the day; a lot of it is reading random guides on the internet, many of which slightly disagree with one another, and pattern matching onto a particular situation.

Outside of programming languages, I introduced myself to AWS CDK, which has served me very well as a way to quickly get a small number of AWS resources provisioned, and then destroy them all once I’m done. Admittedly, I’ve only worked with tiny CDK stacks featuring no more than 10 resources, and can’t speak to its usefulness for larger projects.

And finally, I learned a decent amount about Redis. The Redis API is quite simple and was very easy for me to grasp. But I discovered that this means accomplishing more complex tasks using that API can get messy, fast. Overall, I love it as a tool, even though it made me think hard to make sure I was using it correctly. There was actually another database I spent much more time with over the past year, but I’ll save my thoughts on it for later.

November 12, 2021

Elixir: Falsey Values and Empty Environment Variables


I keep forgetting that the complete list of falsey values in Elixir is :nil and :false. Being accustomed to JavaScript, I find myself continuing to make the mistake of expecting if 0, if "", and the like not to execute.

One thing that surprised me is that System.get_env and System.fetch_env treat an environment variable with an empty string value as “set”.

I often want to use a default value if an environment variable is an empty string or not set.

In JavaScript, I only have to do

if (process.env.MY_ENV_VAR) {
  /* ... */
} else {
  /* Use a default */
}

But in Elixir, I have to do

if System.get_env("MY_ENV_VAR") && String.length(System.get_env("MY_ENV_VAR")) > 0 do
  # ...
else
  # Use a default
end

to achieve the same thing.

November 11, 2021

Erlang httpc: Figuring Out Strings, Charlists, and Binaries


I started learning Elixir earlier this year by writing a simple script to download an HTML webpage and print it out to the console. Rather than reaching for a more popular, Elixir-native library like httpoison, I wanted to try using the standard library, or more specifically, Erlang’s httpc request function.

Everything was swell until I tried my script with a page that had non-ASCII characters in it. For some reason, ë in the original HTML file was being printed out as ë in my console.

I was setting the body_format option to string (the default). That causes the response body to be provided as a list of integers. The docs say that it is a “list of ASCII characters”, which is odd, seeing as neither à nor « are actually part of US-ASCII. It would rather appear that the body is a list of ISO 8859-1 (Latin-1) code points. That was the format of Erlang strings before Erlang/OTP R13.

The problem is that the body will be in that form even if the response has the header content-type: text/html; charset=UTF-8. So in the UTF-8 encoded HTML file that I requested, ë was encoded with two bytes: 0xC3 0xAB. But request put each of those bytes as their own entry in the list of code points I received as the response body. I had assumed that the list was an Erlang string in Unicode (known as a char list in the Elixir docs), and accordingly, I converted it into an Elixir string (a UTF-8 binary) via List.to_string. That function interpreted every item in the list as a separate Unicode code point. And thus, 0xC3 0xAB became ë.

To avoid this, I should have done one of two things

  1. set the body_format option to binary when using request
  2. kept body_format set to string, converting the list to a binary by interpreting every item in the list as a byte, not a code point: list_to_binary accomplishes this

The lingering questions I have following this investigation are

  1. What does file = File.open!("top50.json", [:write, :utf8]) do, as opposed to file = File.open!("top50.json", [:write])?
  2. What is the difference between IO.write and IO.binwrite? What does the former do if the file is opened with :utf8, versus without? Note, IO.write of a string containing ë when File.open is without :utf8 causes an Erlang error: :no_translation.

Below are the notes I accumulated while trying to figure this out

  • httpc with the default :body_format, :string was giving me a list of integers, not a binary.
  • to_string was reading the list of integers as a list of code points, with each element of the list representing a different character.
  • The list of integers was supposed to be read as a list of byte values, with each element of the list representing a byte.
  • The byte values as represented by the list were a proper UTF-8 encoding of the source text. In UTF-8, multiple bytes might represent one character.
  • This is (probably) why one character from the page (ë) became two characters in my strings (ë).
  • Latin 1 (ISO/IEC 8859-1) encoding uses a single byte (8 bits) for each character
  • UTF-8 encodes the first 128 Unicode code points with a single byte, and beyond that with 2 or more bytes
  • Latin 1 encodes the first 256 Unicode code points with a single byte
  • List.to_string has the important docs: “Note that this function expects a list of integers representing Unicode code points. If you have a list of bytes, you must instead use the :binary module.”
  • :unicode.characters_to_binary(body, :utf8, :latin1) only converted the list of byte values into a binary, because every single-byte “character” it was seeing had a single-byte representation in Latin 1. It didn’t do any transcoding.
  • https://erlang.org/doc/man/binary.html#list_to_bin-1 or http://erlang.org/doc/man/erlang.html#iolist_to_binary-1 was probably the right thing to use to deal with the list I was getting from httpc
  • Erlang strings are lists of integer code points http://erlang.org/doc/apps/stdlib/unicode_usage.html#standard-unicode-representation
  • Before Erlang/OTP R13, all Erlang strings were lists of Latin 1 code points, and so were lists of bytes
  • httpc gave me a LEGACY Erlang string. It didn’t decode the bytes into Unicode code points— instead, it just gave me the bytes as a list of integers, which is indistinguishable from a Unicode Erlang string!
  • httpc docs http://erlang.org/doc/man/httpc.html#data-types define string() = list of ASCII characters, and interestingly https://erlang.org/doc/man/string.html uses string() to mean legacy (Latin 1) strings
October 8, 2020

Default MacOS Resolution Scaling Setting Changed in Catalina


Since I can find no reference to this anywhere on the internet, I would like to note that the default MacOS resolution scaling setting appears to have changed between Mojave and Catalina. The former’s default is “Looks like 1280 x 800”, whereas the latter’s default is “Looks like 1440 x 900”. I discovered this while investigating why the MacOS UI appeared larger on my friend’s laptop, which is a 13 inch MacBook Pro like mine.

October 20, 2019

Rigor Checker


Rigor Checker

GitHub repository

I built a joke project to determine how rigorous an input mathematical proof is based on the presence of keywords. The website was built with the Google Cloud Vision API, Flask, React, Webpack, and Ant Design.

From the beginning, I knew I wanted to support input via image, PDF, URL, and plain text. I planned on using Google Cloud for the optical character recognition, but did not know how I would support PDFs or URLs.

The first thing I did was build a simple mechanism for turning input text into a numerical “rigor” value. I assigned scores to certain proof-related phrases like “without loss of generality” and “by induction”. I then wrote a state machine that used each word from the text as an input to compute the next state. Each state would perform a mathematical operation on the text’s current score, which started at 100. Click here to see the code for this initial implementation.

The next order of business was converting the calculator function I currently had into a REST API and adding a simple UI for inputting text. For the former, I used Flask. For the latter, I chose to try out Ant Design.

Making the endpoint took no time at all, and I began experimenting with Ant. After adding a text box, I wanted to explore what Ant had for file uploads. I wanted mobile users to be able to click and get a prompt to take a photo, so I added accept='image/*' to Ant’s Upload.Dragger. After adding the inputs for PDFs and URLs, I put in some buttons to switch between the input types and gave them a media query to hide their text on smaller devices.

Then, I had an idea: What if I outputted the score with a cool-looking gauge that animated up to the actual value? Google led me to gauge.js, but there was an open issue asking about how to integrate it with React. I set out to build a React wrapper component for the library. This ended up being fairly annoying for several reasons:

  1. gauge.js was not written in a React-friendly way. Its usage involves instantiating one of two classes based on what kind of gauge you want. Also, its text animation logic that counts up to the final value is outside of React’s control.
  2. gauge.js does not have comprehensive documentation of its API.
  3. I ran into this issue where npm installing my component locally for testing would cause a mistaken “hooks can only be called inside the body of a function component” error to be thrown by React.

To deal with #1, I used a ref to hold the gauge instance for the component. I also discovered MutationObserver and abused it to give React control over the value text. Unfortunately, I could not help the fact that gauge.js is written with jQuery. The wrapper component I created is on GitHub and npm, but because of gauge.js’ enormous bundle size, I would not recommend using it if a better alternative to gauge.js comes around someday.

I then gave the UI a little progressive fade-in effect when the page first loads using some CSS animations. It’s pretty satisfying, if I do say so myself.

After that ordeal, the UI was essentially complete, and I moved back to building the backend. I added a different endpoint for each type of input. This was my first time accepting file uploads from a REST API, but thankfully, it was very straightforward. I gave Flask a MAX_CONTENT_LENGTH configuration to prevent users from uploading oversized files. Using Google Cloud Vision was also very simple with their client library, and downloading files from user-provided URLs only took a Google search to figure out.

At this point, I had to figure out PDFs. Another search led me to the Ghostscript project. Instead of bothering with trying to use it as a library, I downloaded the Ghostscript command line tool and used it from Python via subprocess. The way I used it, Ghostscript took in a file and outputted a new one. I had to be careful to give Ghostscript’s output file a randomly generated filename; it’s possible for multiple instances of Ghostscript to run at the same time if multiple requests come in simultaneously because of Flask’s multithreaded nature.

However, there was a problem. There are two kinds of PDFs: pure text PDFs and scanned PDFs. If a user provided a scanned PDF, which is basically just a huge picture for every page, Google Cloud Vision should be used on it—not Ghostscript, which just parses text. I did a lot of searching and contemplating. Some people online suggested looking for fonts in the PDF. Others suggested calculating words or images per page. In the end, because I could not find a completely reliable way to determine which kind of PDF a given PDF file was, I chose to disallow scanned PDFs entirely and always used Ghostscript.

By this time, I had noticed a flaw with my state machine methodology of scanning the text (see GitHub issue). I took this as a rare opportunity to learn and integrate an algorithm into a project. It took me a very long time, but with the help of my friends, I figured out the intuition behind the fairly complicated Aho-Corasick algorithm and implemented it.

Deploying was made painful by the fact that I needed my backend API to be available over HTTPS. To get around this, I ran the Rigor Checker backend on the same VPS that Lost-n-Phoned (lostnphoned.com) is running on, since it already had a certificate set up. I had to configure the web server that was already reverse proxying Lost-n-Phoned to send certain routes to Rigor Checker, but that was all.

The most satisfying part of this project was being able to visit the website on my phone, click the image upload button, and enter the camera view for taking a picture of a handwritten proof. As soon as I did that and saw the gauge move up to its number, I thought to myself, “this was all worth it.”

August 6, 2019

What Happens in a Year


The following things happened in the past year:

The code I wrote for all of these things is closed-source for various reasons, and it never occurred to me that I could write public blog posts about closed-source projects—hence the yearlong hiatus in my writing.

Looking back, there was much I could have said here about the countless technical challenges I faced in these past months. Many of those challenges were general enough that I could have described them without disclosing any private information about the projects, which, by the way, are still not ready for full release to the public.

Anyways, I’m back. I intend to write here more frequently again.

Regarding the way the website looks: I noticed that I kept wanting to change the theme of this website, as my visual tastes shift frequently and I tire quickly of whatever the current theming is.

My solution, which you see today, was to create my own theme. I have wanted my own theme for a while now. This time, with full control in my hands, I made the styling of the site so barebones and simple that it should be quite difficult to get tired of it. I made it look almost as if there was no CSS, like a webpage from the days of Netscape.

I like the way it looks, really. It suits my current tastes, even if it may fail to stand the test of time. The plain white background, the default serif font. The hilarious placement of all posts on the same page, which conveniently allows for Ctrl+F searching. Like young women wearing 90s-style mom jeans, I’m shamelessly bringing an old look into the modern world. But if you find it too atrocious to bear, I won’t mind if you tell me so.

September 16, 2018

Changing ls Colors


My WSL bash had always suffered from very ugly ls outputs until yesterday, when I realized that the folders being listed with green backgrounds were a certain type of folder: other-writable. I discovered that in the answer to this stackexchange question.

I had always thought that there was something wrong with my configuration because I had manually set di=1;34 in my LS_COLORS environment variable, yet my folders still had a green background. I now know that I have to set ow=1;34 and tw=1;34 to change other-writable and sticky + other-writable directories as well.

However, I also now know that it is a lot easier to configure ls colors using the dircolors program. To configure my colors, I did dircolors --print-database > ~/.dircolors. I then opened that .dircolors file and changed the STICK_OTHER_WRITABLE, OTHER_WRITABLE, and STICKY values so that all my directories are displayed in the same way; I don’t care about distinguishing between directories with different permissions.

I then wrote two things in my ~/.bashrc:

  1. eval "$(dircolors -b ~/.dircolors)". This modifies the LS_COLORS environment variable.
  2. alias ls='ls --color=auto'. This makes sure ls always uses the LS_COLORS variable to format its output.

The advantage of using the .dircolors file is that it’s a lot easier to edit and understand as opposed to manually modifying the LS_COLORS value.

September 13, 2018

Setting Up Git in WSL


On my laptop, which runs Windows 10 Home, I use Windows Subsystem for Linux (WSL) to run an Ubuntu command line environment within Windows. With WSL, I can use Bash instead of Powershell for all my command line work. I have even configured my VS Code integrated terminal to use WSL’s bash.

If you are a developer on Windows, I highly recommend setting up WSL. It’s very easy, and it already comes with Windows 10.

However, there are two things you need to do to make your WSL Git work well with your files on Windows.

First, in your WSL environment, run git config --global core.autocrlf true to have Git use only CRLF line endings. If you don’t set this, Git will show every file as modified.

Relevant GitHub issue

Second, if you want to avoid having to constantly type your GitHub username and password, you must tell Git to use the correct credential helper. In a Windows shell (Powershell or CMD), do git config --list to check what your credential.helper value is. If you are using credential.helper=manager, you need to set your WSL git to use the same program.

In your Windows Git configuration, the line credential.helper=manager means that Git will run the program git-credential-manager.exe. However, your WSL Git will not know where to find that program. What you need to do is set the full filepath of that program in your WSL Git configuration.

In Windows, run git --exec-path to find the folder where git-credential-manager.exe should be. Confirm that it is in the folder outputted by the command.

Next, in WSL, do git config --global credential.helper "/mnt/c/Program\\ Files/Git/mingw64/libexec/git-core/git-credential-manager.exe" to set the path.

Instead of the path that I used, use the path you got from the exec-path command, with the /git-credential-manager.exe part appended. Remember to escape spaces and backslashes. Also, make sure the path you input starts with a / (e.g. /mnt instead of mnt). If the value of the credential.helper setting in your Git config does not start with a /, Git will try to append the value to “git-credential-” and execute that.

Relevant StackOverflow question

Have fun with WSL!

September 10, 2018

Piped Commands Run Concurrently


As I was learning about Unix piping in preparation to write a shell for my operating systems class, I discovered something new on Stack Overflow that I had never before realized during my limited experience with pipes: Piped commands run concurrently, and could be started in any order.

Let’s use an example: ps | less Intuitively, you might think that ps runs to completion, and then less runs. In reality, you cannot know for sure which program will be started first, and both programs will be running at the same time.

There are two possibilities with the above example:

  1. ps (process status) is started first, and does whatever it does to determine the currently running processes before less is started. Then less runs and starts receiving text through the pipe from ps. The list of processes that less displays does not have less in it.
  2. less is started first, and waits for input from the pipe. ps is then started, and determines the currently running processes. This time, less is already running, so less is included in the list. Now less receives the list through the pipe and displays it.

The reason you cannot know for sure which program will run first is because the shell uses an operating system call, fork(), to run the two programs. The order in which the programs are started is determined by the operating system’s scheduler. When the shell uses fork() to spawn the child processes ps and less, the processes go into a list of ‘ready’ processes. The scheduler picks processes to start from that list based on an algorithm, and it is not straightforward to predict which process will be started first. The child processes, once they are both active, are run concurrently. This means that the OS switches between the two processes, who take turns running with each other, as well as with all the other processes currently active on the machine. The program on the right of the pipe character, if it requires input, will wait whenever necessary for the left program to send data through the pipe.

Why does any of this matter? Sometimes, start order of piped programs matters. Let’s say you want to read from a file and then overwrite it based on what you read. Something like this: grep "some search pattern" some_file | tee some_file

You want grep to read from some_file, search for “some search pattern”, and then have tee write the result of the search (the lines the pattern was found in) back to some_file.

You have a big problem if tee starts first, though. If you provide tee a file to write to, the first thing tee does when it starts up is clear the file if it already exists. This is so that tee overwrites any preexisting data in the file.

By the time grep reads the file, it will have been emptied by tee! As explained earlier, this problem will not occur every time. Sometimes you’ll get lucky and grep will be able to do its search before tee gets a chance to clear the file. Other times, you won’t be so fortunate.

Oh, and by the way, grep "some search pattern" some_file > some_file won’t work in Bash either, because Bash deals with file redirections first. Bash will clear some_file before executing grep.

Alright, that’s all I have for you today. Be careful out there!

September 4, 2018

Defining Functions in Header Files


I discovered C++ inline functions a few weeks ago as I worked on the Blip programming language, which was the end-of-semester project in EE312 Software Design and Implementation I. I didn’t write here about it and I can’t reveal the source code because it’s a school assignment.

However, I do want to write about a specific incident where I tried to define a method outside of the class definition, but inside the header file. Like this:

// header_file.h
#ifndef header_file
#define header_file 1
class MyClass {
    void my_method();
};

void MyClass::my_method() {
    // code
}
#endif

When I compiled, there was a linker error that said the method had multiple definitions. My first thought was that the #ifndef should have protected the contents of the header file from being repeated. It wasn’t long before I realized that the #ifndef only protects the header file from being repeated in a certain source file (compilation unit).

Say I #include the header and several other files in my main.cpp file. The #ifndef prevents the various #includes from copy pasting the header file into main.cpp multiple times. However, each .cpp file (compilation unit) has its own distinct #define values because the preprocessor runs separately for each .cpp file. So if I have another file called other.cpp, the header’s contents could still be pasted into there by a #include at most once. And in fact, this is the situation that I had. Thus, when the linker tried to combine my .cpp files, it discovered that my method had multiple definitions: one in each .cpp file that included the header.

To summarize, #ifndef prevents the header from being included multiple times in a single .cpp file, but it doesn’t prevent the header from being included into two or more different .cpp files.

I learned by Googling that the solution is to make the method inline, like this:

inline void MyClass::my_method() {

From what I understand, inline functions are functions that are copy pasted wherever they are called. This means the program counter doesn’t need to jump to the location of the function at runtime, which would be the case if the function wasn’t inline. I’m not exactly sure why, but in C++ (and not in C), an inline function is allowed to be defined in multiple compilation units. Perhaps it is because the copy pasting occurs only within a given compilation unit.

As a final interesting note, I learned that functions defined inside a class definition are inline functions! This is probably why you can have methods defined inside class definitions in header files that are included in multiple .cpp files without any linker error at all. That may have been confusing, so here is what I mean:

// header_file.h
#ifndef header_file
#define header_file `
class MyClass {
    void my_method() {
        // code
    }
};
#endif

No problems, because methods defined inside class definitions are implicitly inline.

August 9, 2018

TJCTF


Yesterday and today, I have been working on problems from TJCTF. To my surprise, I found them highly addictive. I think it’s mainly because the problems are at a level where I have a fair chance of success; the last CTF I participated in was far too difficult for me. However, TJCTF is targetted toward high-schoolers. The most interesting (and painful) problems I worked on had to do with analyzing binaries and reading assembly code.

I am considering writing some write-ups for the problems and putting them in a GitHub repository. I think I would find the write-ups useful in the future if I do more CTF competitions. This event has reignited in me an interest in security engineering, and I’m going to start reading Ross Anderson’s book on the subject.

August 1, 2018

After Lost-n-Phoned


Lost-n-Phoned is almost ready for general public use. I’m currently going through Google’s OAuth Developer Verification to get rid of the “This app isn’t verified” warning that shows up when anyone tries to register. It’s a slow process involving back and forth emailing.

So what’s next after Lost-n-Phoned? This month, I will focus on finishing strong in my EE 312 class, designing/building my portfolio website, and practicing interview problems. I might also be starting a new student organization at UT, but I’m not 100% on that quite yet.

I want to relax during this last month before the Fall semester starts again, so I probably won’t be coding with the same intensity as before (which means less posts here). I’ll definitely make up for it, considering that I will be taking both EE461S (Operating Systems) and EE422C (Software Design and Implementation II).

July 29, 2018

An Issue with Caddyfiles


I discovered a problem with my Caddyfile today, and considerable Googling and reading of Caddyfile documentation did not help me.

In my Caddyfile, I am proxying the endpoints /twilio, /authorize, and /oauth2callback to my WSGI server. However, one of my static pages was located at /authorize-success (it has since been moved to /success as a workaround). I was surprised to discover that the folder was captured by the /authorize proxy rule! The Caddy docs suggest using /authorize/ if I do not want to include /authorize-success. When I tried that, Caddy failed to proxy the endpoint properly and I received 404 errors.

I also tried

proxy /authorize {
    except /authorize-success

and

proxy /authorize {
    except /authorize-success/

but both still resulted in 404 errors when I tried to visit authorize-success.

July 29, 2018

Lost-n-Phoned Unique URLs


Before today, it was possible for people to register a phone number on Lost-n-Phoned even if they did not have access to the phone with that number. They could do this by visiting https://lostnphoned.com/authorize?phone= and appending any phone number.

Why did I set it up this way? The issue was linking HTTP requests from Twilio with HTTP requests from the registering user’s web browser.

Let’s go through an example to make this clearer. A user texts the service “Register” to register. The text is received by Twilio and is passed on to my server by an HTTP request coming from Twilio. My server now knows the phone number of the person who wants to register. My server tells Twilio to text the person back with a link to allow Lost-n-Phoned access to their contacts. He clicks the link, but when my server receives the HTTP request from the user’s web browser, how is it supposed to know which phone number is associated with this HTTP session? If multiple people are registering at once, which is entirely possible, Flask can’t know which HTTP request from Twilio to associate with which HTTP request from the users’ web browsers.

This association is important because it is in that second session with the user’s web browser that the server receives Google authentication tokens. Those authentication tokens obviously need to be associated with user phone numbers.

My initial solution was to put the user’s phone number in a query string. This led to URLs of the form https://lostnphoned.com/authorize?phone=0123456789, as I mentioned earlier. When users clicked the link sent to them, the server used the query string to get the user’s phone number. When the user finished authenticating with Google, their number and the tokens from Google were stored together in the database.

Notice, however, that I couldn’t verify that the visitor to https://lostnphoned.com/authorize?phone=whatever actually came from the phone with that number! After all, anybody can visit the URL with any phone number in the query string, so anybody can register any phone number.

I tackled this problem by generating a unique URL query string for each registering user. I use the Python uuid module to generate universally unique identifiers (UUIDs). The UUIDs are then converted to base58 strings, which are put in a query string. The result is URLs that look much like the verification links you get in your email. When the user texts “Register”, the URL, with the generated query string appended, is sent to the user through Twilio. The query string is also stored in a database along with the user’s phone number. Since only the user texting Lost-n-Phoned gets this unique URL, only that user can visit it!

As before, when the user clicks the link, their browser makes a new request to my server. This time, the server uses the UUID in the query string to look up the user’s phone number in the database. Now my server could store the number and Google tokens with the guarantee that the user actually texted Lost-n-Phoned from that phone. Voila!

July 29, 2018

Lost-n-Phoned Security


I have finally, finally, finally added a security implementation to Lost-n-Phoned. For now, I’m forcing users to create one-time-use passwords. Every query to the service has to have a password with it, and the password can’t be used again. Kunal is strongly opposed to the idea of one-time-use passwords, so I’ll discuss the topic with the original CodeRED team and see what they say.

The implementation is not very complicated. I allow users to add passwords on demand by texting “Add theirpasswordhere”. If they have an account, I take the password and store it, salted and hashed, in the database. Unfortunately, as of now, all of each user’s passwords are salted with the same salt. In other words, salts are per user instead of per password. Users have a salt generated and stored when the register. I wanted users to be able to input any of their passwords when they make a query. However, if each password had its own salt, I would have had to try each password’s salt on the user’s input to find a match. For users with more than a few passwords, it would take too long. A compromise had to be made.

I’m very pleased that I got this task out of the way. It had been intimidating me weeks, staring back at me every time I visited the repository’s Issues page. Now that it’s gone, I can move on to finish one, last, task…

July 26, 2018

Lost-n-Phoned Phone Number Validation


I’ve finally added phone number validation to Lost-n-Phoned! I made use of the phonenumbers package on pip. Unfortunately, the feature required me to add multiple complex if branches, but I don’t think it’s too bad.

You can format your phone number in any reasonable way when querying for your contacts now. Also, you have to register on the phone with the phone number you want to register. This keeps people from registering phone numbers that aren’t theirs.

The final hurdle is here: the security scheme, which I haven’t even decided upon, let alone implemented.

July 24, 2018

Deploying Lost-n-Phoned


Uh oh, I haven’t updated this in a long time! That doesn’t mean I haven’t been pumping out code, though. I’ve been working hard on preparing Lost-n-Phoned for public use.

A little backstory on Lost-n-Phoned: It’s a Flask app that my friends Bala, Leon, and Max made with me at CodeRED Exploration. Essentially, it makes your Google contacts available on any phone through text messaging with the Twilio API.

This past weekend, I packaged Lost-n-Phoned into a pip package for easier deployment. I set up a Digital Ocean droplet and installed the package there. My initial plan was just to use the WSGI server Waitress to run the code. I wanted to run Waitress without privileges, so I set an iptables rule to forward port 80 traffic to the port Waitress was running on. Unfortunately, I quickly discovered that OAuth2 requires a connection over HTTPS, and that Waitress does not support HTTPS at all.

I decided to go with a standard way of deploying WSGI apps, which is running a WSGI server behind a reverse proxy. Users would connect to the reverse proxy server with HTTPS, and the server would pass the requests along to Waitress through plain HTTP. I wanted to use a certificate from Let’s Encrypt for HTTPS, but I didn’t want to bother with setting up an ACME client. For that reason, I decided to use Caddy rather than Apache or Nginx for the proxy.

I spent several hours scratching my head at an error Caddy was throwing as it tried to get a certificate. I eventually gave up and installed Nginx. Only when Nginx failed to serve its example site did I finally remember the iptables rule from before! After I removed the rule, Caddy was able to properly use port 80 and ran smoothly.

The next problem I had was with the Flask app thinking that its URL was localhost instead of lostnphoned.com. A bunch of Googling led me to the solution, which was to configure Caddy to pass certain headers in the requests from clients to Waitress. Finally, the Flask app was fully operational.

I spent a few hours today making the lostnphoned.com website with plain HTML and CSS. I went with a simple design because most visitors to the site should be on mobile. I’m pretty happy with how my CSS skills are progressing. The CSS for this site was pretty well-organized and I felt like I understood all the styles I was using.

All that’s left now is some more Python coding in the Lost-n-Phoned internals. Two critical features, phone number validation and a security mechanism, are still missing. I’ll be back with a progress update soon!

July 16, 2018

AngelHack Austin 2018


On Saturday and Sunday, Bala, Kunal, and I attended the AngelHack hackathon in Austin. We spent our weekend working on Bala’s idea, which is a platform to make Ethereum donations to people raising funds for charity. The platform, called Trust Me, ensures that the donated Ethereum can only be sent to the charity’s account.

While Bala and Kunal worked on writing the Solidity code, I worked exclusively on the project’s website. The website is the place where donors can browse fundraisers and donate Ethereum.

It was the first time in my life that I made a decent-looking site without using any templates. I did rely significantly on Picnic CSS to make a grid of fundraiser cards, but that was the extent of outside code I used.

I used lots of images from Unsplash as backdrops to informational text. Surprisingly, Arial didn’t look bad at all so I didn’t end up using any Google Fonts.

At Bala’s request, I used JavaScript to add a fade transition that changes the website hero image periodically. The result impressed even me!

I decided to add the Trust Me site and the HTML5 2048 game (which is now complete) to my portfolio site at KeaneNguyen.com. However, I had a lot of trouble making images of the projects look good in the layout that I’m currently using for that site.

With the boost in confidence from the outcome of the Trust Me site, I have decided to write my own portfolio page from the ground up; I don’t want to use any CSS libraries or templates. If I’m going to be a front end developer after all, then I need to be able to design my own page!

As of now, I plan to design the website by making my very own Hugo template.

July 13, 2018

2048 - Mobile Scaling


Over the past few days, I’ve managed to make the game scale for mobile devices. On smaller screens, the game’s iPhone simulation goes away and the game does its best to go full screen. I even added a bit of JavaScript scaling inside the canvas element itself, in addition to CSS. This makes the game look better on different screens.

Thanks to Kunal, the game also now accepts touch input. He’s working on giving the game sounds in this very moment.

July 11, 2018

2048 - Mouseover Highlight


Today, I added code to make the game’s restart button light up when the player’s mouse is over it. For a while, I was baffled because the mousemove event wasn’t being triggered when I tested with the developer tools window open. Eventually, I realized that my code works fine when the developer tools panel is closed. I think it might be because the mouse is used for simulating touch events when developer tools are active.

July 7, 2018

Animating 2048 - The Execution


I spent the entire day working on the game. Literally sun up ‘till sun down. I’m happy I can say that the animations are complete. I managed to add them without rewriting update.js, although I still might do that in the future. I can now confirm from personal experience that animating manually without a framework to help is very, very painful.

Click here to play the game.

July 7, 2018

React Progress Update


I just finished my first (super, super simple) React web page! It uses some JSX templating to dynamically update an ordered list tag with user input. The page doesn’t have any CSS right now, so it’s ugly as sin. If I give it a makeover someday, I’ll share it. Don’t hold your breath though.

July 6, 2018

Animating 2048 - Data Structures


I’ve now set up the necessary data structures and the operations on them required for the render() function to know how to animate the correct blocks. Tomorrow I will work on the animating itself. Right now, the update.js file is a huge headache to deal with. It is literally 100% spaghetti code (I’m partially to blame) and it needs to be rewritten from scratch as soon as possible. I should have changed Bala’s madness before starting the animation code, but it might be too late now.

July 5, 2018

Animating 2048 - The Plan


I didn’t do much coding today; I was cleaning the apartment and preparing for my upcoming EE 312 exam. However, I have been mulling over how to animate the blocks in 2048.

Here’s the strategy I’ve come up with:

  1. In update(), make a list of objects being animated with beginning/destination values that are grid coordinate pairs.

  2. Pass a start time to render(), and have render() calculate positions for each image using the time elapsed and the distance in pixels between the beginning/destination.

  3. Before rendering anything, determine if any animation objects have a destination that is a block. If they do, make sure to draw the destination block as half its value (its old value) so that it can promote after the incoming block slides in.

  4. Related to the previous point: If a moving block is heading toward another block, fade the incoming block out as it approaches its destination (to “slide” the incoming block underneath).

  5. With every new update() call, clear and regenerate the list of animation objects to skip the previous animation if it’s still in progress. This allows the user to animation cancel with rapid, successive key presses.

July 4, 2018

2048 in Plain Javascript


I forgot to update TechLog for the past several days because I have been working non-stop on a project that I started with Bala and Kunal last weekend. We decided to have our own little hackathon, where we would create a project in a non-competitive environment and learn along the way. We chose to make an HTML5 game to learn about client-side web dev.

Initially, I wanted to use the popular Phaser library to create a more complex game. However, due to resistance from Bala, we opted to use plain Javascript to make something simpler: the once-popular mobile game 2048. We got started by looking up tutorials on making HTML5 games from scratch. We learned how Javascript can render images into an HTML document using the canvas tag, and through trial and error, learned how different Javascript files are linked together by the browser.

By working from Saturday evening to early Sunday morning, we managed to get the game working with simple images, but no animations. The game was inside a simple box on a plain page. Bala wrote most of the game logic, Kunal wrote the rendering function, and I wrote the user input, HTML, and CSS.

Since Sunday morning, I have been working mostly alone to dramatically change the appearance of the game. The game now looks like it is an app running inside an iPhone X. I added an end game screen, and now I am working on adding animations. Hopefully Bala can help me with this daunting task.

I have been taking a desktop-first approach to coding the game. So far, testing has mostly been done on large screens. Eventually I am hoping to make the game mobile friendly, but it will take a lot of work to make sure the canvas looks right on devices of various sizes.

June 29, 2018

Flask Tutorial


I’ve been following the Flask tutorial and am currently a little more than halfway through it. Most of the things I’ve learned are Flask features that I didn’t know existed, like blueprints, views, and templates (yeah, I didn’t know much about Flask). It’s going to take me longer than I thought to get a solid grasp of what’s going on in the tutorial project.

June 26, 2018

EE 312 Project 2


The only coding I did today was for EE 312 Software Design and Implementation I. I have been tasked with writing functions to perform matrix multiplication and transposition using pointers and malloc() calls. I haven’t had much difficulty so far, but I need to be extra thorough with my test cases for this project because I missed several corner cases on my previous assignment.

June 25, 2018

EE 313, Upcoming Hackathon


Unfortunately, this week I will be very busy with my EE 313 Linear Systems and Signals class. I have homework due Wednesday and an exam on Friday. The material has been very difficult for me to understand so far, especially because I don’t have a good grasp on the mathematics the course is built upon. This means I will likely not have time for anything other than schoolwork until next weekend.

In other, more cheerful news, I am planning to attend the upcoming AngelHack Austin hackathon with some of my friends. I might make another update on that later.

June 24, 2018

Lost-n-Phoned Status Update


As I went through the Flask documentation on how to deploy to production today, I realized that I am in way over my head. I don’t understand how to use a web server with Flask, nor do I know what a WSGI server is. Instead of continuing to work on Lost-n-Phoned, I am going to follow the tutorial in the documentation which walks through making a blogging web app with Flask.

I’ve skimmed through the tutorial, and I noticed some interesting things. First, it goes over making the whole application a package for quick deployment, which I didn’t know was possible. Second, it goes over making automatic tests, which is something I’ve never done in Python. And finally, it goes over using SQLite3. Upon Googling SQLite, I discovered that it is dramatically different from databases like MySQL! There is no server and no client; just a file on the disk. Perhaps I should have been using it for my Keane Cogs instead of JSON all along…

June 23, 2018

Adding Pagination to this Site


I woke up this morning and browsed to this site on my phone to take a look at how last night’s post looked. For the first time, I realized that it was completely impossible to access posts other than the latest three shown on the homepage.

To fix this, I looked through Hugo’s docs on pagination and the code from a different theme, Hestia Pure. Using what I learned, I updated my fork of the Base16 theme, which this site uses, and added page navigation links to the bottom of the homepage. I then submitted a pull request to the original theme, since there’s already an open issue there requesting pagination.

Enjoy reading through my older posts!

June 23, 2018

Resurrecting Lost-n-Phoned


The last time I worked on Lost-n-Phoned was exactly 7 months ago. Now that my portfolio website is on the internet, I need to fill it up with projects. Lost-n-Phoned is definitely something I want on my website—the only problem is that it’s not quite portfolio-worthy yet. The website is basically nonexistent and the service still doesn’t have an authentication scheme, which is required if it’s ever going to go live for the public to use. Even if the code was complete, there’d still be the fact that I shut down the Digital Ocean droplet running it half a year ago.

This weekend I hope to get the project up and running again. I’m not sure how long it will take, but as of now, I’m halfway done with setting up the server. What I suspect will take longest is implementing some form of security for returning people’s phone numbers.

My main concern is that I will be splitting my time between Lost-n-Phoned, learning React, and catching up in my EE 313 class. I know for sure I won’t be able to meet my goals for all three; I just don’t know which ones, if any, will be accomplished. Here’s to a productive weekend!

June 22, 2018

pip show Follow-up, React, and WholesomeBoard


The first of the three things in the title I’m going to write about today is the pip show pull request I worked on earlier this week. As it turns out, Kunal and I had misunderstood how the pip maintainers wanted to fix the issue we were working on. The change they wanted was only a documentation update, and not a new flag. Unfortunately, this meant that all the code we had written was for naught, and I had to close the original PR we opened.

On another note, I have finally started learning React JS. My primary motivation is my desire to build web apps with the framework, but another significant incentive is the high demand for it from employers. It’s important for me to acquire some marketable skills right now, as I will be applying for summer 2019 internships within a few months.

Lastly, I just want to say that I have acquired the domain name WholesomeBoard.com. As soon as I am half competent with React, I want to start working on reviving the Wholesome Board project I made with my team at Wholesome Hacks. The latest version of the code is currently on my branch in the repository, if you want to look at it. The version of the code that we demoed with is here.

June 21, 2018

From Hugo to Hubris: Lessons from a Jekyll Site


On Tuesday (two days ago), Kunal asked me for help with setting up a personal website for himself. I recommended Jekyll to him for three reasons:

  1. He wanted to host his page on GitHub Pages, and GitHub continuously deploys Jekyll sites automatically.
  2. Jekyll’s documentation is well organized and well written.
  3. Jekyll has tons and tons of themes.

Unfortunately, Kunal approached me for assistance as I was in the middle of trying to get gdb-dashboard to work. Hoping to retain as many rail cars in my train of thought as possible, I quickly referred him to the Jekyll quick start guide before switching back to reading the gdb-dashboard source code. Kunal promptly sat down next to me and, to my dismay, started watching a YouTube video on how to set up a Jekyll site.

Is he illiterate!? I thought to myself. The quick start barely has 5 steps!

I realized that I had promised to help him the day before, so I reluctantly closed the code I was looking at and turned to help him.

“Here, close out of that, I’ll help you,” I told him. “The first thing we need to do is have Jekyll generate the basic site files.”

I hurriedly walked him through creating an empty site and uploading it to a new GitHub repository. I then helped him install the theme he wanted and showed him his new site locally by running the Jekyll server.

“Let’s add a post,” I said.

I created a new markdown file in the _posts folder, named it something random, then wrote some words in the file. I had Kunal run the bundle exec jekyll serve command to test the change, but the build failed. I had never used Jekyll before (honestly, I just assumed everything to be the same as in Hugo) and was unaware that post file names had to follow a certain dated format. Not only that, but I had also completely botched the front matter formatting in my haste. I renamed and edited the file according to the docs and Kunal ran the local server again. This time, the new post showed up right where we expected it; from the home page, we clicked on the link to the posts listing page, and there it was!

I instructed him to add, commit, and push the new post to GitHub. I sighed and waited for the automatic build to complete.

My work here is done, I thought. Back to fixing that damned .gdbinit file!

Little did I know my impatience would waste far more of my time than it would save!

We quickly realized that, although the build completed successfully, the post was not being listed in the posts page. For the next thirty minutes, we tried all sorts of things. From changing the front matter to changing the file name to copying example posts exactly, nothing would work. We disabled caching and refreshed over and over to no avail. For some inexplicable reason, the page always showed up when served locally, but never on GitHub Pages!

I browsed to his site on my computer, went to the Posts page, and grimly confirmed that the post was still missing. I even tried changing the end of the URL to get to the page URL directly. That didn’t work either.

Finally, I admitted defeat. It was getting late, and I still needed to wash the dishes.

“I’m sorry my friend,” I said, as I walked to the kitchen sink. “I’ll help you with it tomorrow.”

I turned on the water and barely picked up a plate when Kunal said, “Oh! The link to the posts page goes to the demo website, not mine!”

I dashed to him.

“My god, you’re right! We never set the site URL in the config file!”

I went to Kunal’s home page, clicked on the Posts link, and watched as the top level domain switched, confirming what I suspected. Because we copied the config file from the theme’s repository, the site URL was still set to the demo site. This meant that every resource and link on the site was routed to the demo site, not Kunal’s GitHub Pages URL. Since Kunal hadn’t changed anything about the website yet, the look of the two sites was indistinguishable. Still, it’s hard to believe that even as I modified the end of the URL to try to find the post directly, I never noticed that the domain was completely different from github.io.

I shook my head in disbelief, facepalming internally. He would have been better off just following that YouTube video! I fumed at myself.

Thankfully, editing the config file did the trick and his site was finally fully functional.

I’ve learned my lesson. I need to be more patient, and I need to remember: always do the configuration first!

June 20, 2018

gdb-dashboard


After staring at the gdb-dashboard code for a long time, I finally wrapped my head around the str.format() fields being abused* in the list() method of the Source module. I fixed some of the field names, and the Source module outputs code correctly now, but I still want to rewrite the module so that I can be sure it will still work even if I change the style settings.

All this trouble, just because the UT ECE Linux machines don’t have Python 3!

*Apologies to the creator of the project, but I really don’t think anyone should be nesting three format fields in a string. Maybe I don’t know anything though.

June 19, 2018

Living in a Shell


I’ve successfully completed my first project for EE 312 (Software Design and Implementation I) using only a remote terminal. I’ve been familiarizing myself with vim and gdb along the way and it’s honestly been quite fun.

Today, I found a game-changing project on GitHub called gdb-dashboard that allows me to see the registers, stack, disassembly, and other things all with one command. The problem is, the UT ECE Linux machines that I ssh into only have Python 2.6, and the .gdbinit file is written with Python 3. So far I’ve made a fork and changed the {} fields so that the dashboard actually runs. In so doing, however, I introduced a bug causing the Source module to output empty lines instead of code.

I need to spend a few hours sometime this week to fix the code. If I can get the dashboard fully operational, I’ll have eliminated my biggest fear related to doing pure command-line development: slow debugging.

I plan on completing the rest of the semester’s assignments in a shell as well, just for the heck of it. I want to get a taste of what programming might have been like many years ago before GUIs were around.

June 18, 2018

pip show --json


My pull request to pip just passed all of Travis’ tests for the first time. I’ve spent many of the past 48 hours working with my roommate to make our first contribution to Python pip. Our change is really small and simple, but it was very difficult for us to get even a basic understanding of the pip source code. Right now we don’t even know how to test our changes! Hopefully we can fix any mistakes (with the help of pip maintainers) and eventually get the pull request merged.

June 18, 2018

NetlifyCMS


NetlifyCMS is one of the coolest projects I’ve ever seen. When I first read its setup docs, the steps were so simple that I got confused. It took me a long time to realize that one index.html page is all it needs because the index page pulls everything else from a CDN! I now have NetlifyCMS on both Thoughts and Techlog, and I’m using it to write and publish this post.

June 17, 2018

Initial Post


I’m creating this site as a new format for journaling my incremental work on tech projects over time. I intend to write updates here that act like longer git commits for each day.

Today I’ve been working on setting this site up. I spent a long time looking through the HTML and CSS files in the Base16 Hugo theme that this site is running on. In the end, I only changed the font and the color gradient image at the top of the site. I would have liked to add an About page, but when I tried to, Hugo built the page as a list-type page instead of a single-type page. I need to find out how to make the About page the correct type so that I can write in it, but I don’t know when I’ll get around to it.

My main priority for the next day or two is getting Netlify CMS up and running on this site and on Thoughts. I’m hoping there won’t be many issues because I want to start working with React JS as soon as possible.