Codecs

A BinaryCodec[T] defines both the serializer and deserializer for a given type:

trait BinaryCodec[T] extends BinarySerializer[T] with BinaryDeserializer[T]

Primitive types

The io.github.vigoo.desert package defines a lot of implicit binary codecs for common types.

The following code examples demonstrate this and also shows how the binary representation looks like.

import io.github.vigoo.desert._
import io.github.vigoo.desert.shapeless._
import io.github.vigoo.desert.zioprelude._

import java.time._
import java.time.temporal.ChronoUnit
import scala.math._
val byte = serializeToArray(100.toByte)
100
val short = serializeToArray(100.toShort)
0100
val int = serializeToArray(100)
000100
val long = serializeToArray(100L)
0000000100
val float = serializeToArray(3.14.toFloat)
6472-11-61
val double = serializeToArray(3.14)
64930-7281-21-12331
val bool = serializeToArray(true)
1
val unit = serializeToArray(())
val ch = serializeToArray('!')
033
val str = serializeToArray("Hello")
1072101108108111
val uuid = serializeToArray(java.util.UUID.randomUUID())
-49-100106-6895896621-1213468125-11097-10536
val bd = serializeToArray(BigDecimal(1234567890.1234567890))
36495051525354555657484649505152535455
val bi = serializeToArray(BigInt(1234567890))
473-1062-46
val dow = serializeToArray(DayOfWeek.SATURDAY)
6
val month = serializeToArray(Month.FEBRUARY)
2
val year = serializeToArray(Year.of(2022))
-2615
val monthDay = serializeToArray(MonthDay.of(12, 1))
121
val yearMonth = serializeToArray(YearMonth.of(2022, 12))
-261512
val period = serializeToArray(Period.ofWeeks(3))
0021
val zoneOffset = serializeToArray(ZoneOffset.UTC)
0
val duration = serializeToArray(Duration.of(123, ChronoUnit.SECONDS))
00000001230000
val instant = serializeToArray(Instant.parse("2022-12-01T11:11:00Z"))
000099-120-117-600000
val localDate = serializeToArray(LocalDate.of(2022, 12, 1))
-2615121
val localTime = serializeToArray(LocalTime.of(11, 11))
111100
val localDateTime = serializeToArray(LocalDateTime.of(2022, 12, 1, 11, 11, 0))
-2615121111100
val offsetDateTime = serializeToArray(OffsetDateTime.of(2022, 12, 1, 11, 11, 0, 0, ZoneOffset.UTC))
-26151211111000
val zonedDateTime = serializeToArray(ZonedDateTime.of(2022, 12, 1, 11, 11, 0, 0, ZoneOffset.UTC))
-2615121111100000

Option, Either, Try, Validation

Common types such as Option and Either are also supported out of the box. For Try it also has a codec for arbitrary Throwable instances, although deserializing it does not recreate the original throwable just a PersistedThrowable instance. In practice this is a much safer approach than trying to recreate the same exception via reflection.

import scala.collection.immutable.SortedSet
import scala.util._
import zio.NonEmptyChunk
import zio.prelude.Validation
val none = serializeToArray[Option[Int]](None)
0
val some = serializeToArray[Option[Int]](Some(100))
1000100
val left = serializeToArray[Either[Boolean, Int]](Left(true))
01
val right = serializeToArray[Either[Boolean, Int]](Right(100))
1000100
val valid = serializeToArray[Validation[String, Int]](Validation.succeed(100))
1000100
val invalid = serializeToArray[Validation[String, Int]](Validation.failNonEmptyChunk(NonEmptyChunk("error")))
0210101114114111114
val fail = serializeToArray[Try[Int]](Failure(new RuntimeException("Test exception")))
val failDeser = fail.flatMap(data => deserializeFromArray[Try[Int]](data))
// failDeser: Either[DesertFailure, Try[Int]] = Right(
//   value = Failure(
//     exception = PersistedThrowable(
//       className = "java.lang.RuntimeException",
//       message = "Test exception",
//       stackTrace = Array(
//         repl.MdocSession$MdocApp.<init>(codecs.md:242),
//         repl.MdocSession$.app(codecs.md:3),
//         mdoc.internal.document.DocumentBuilder$$doc$.$anonfun$build$2(DocumentBuilder.scala:89),
//         scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18),
//         scala.util.DynamicVariable.withValue(DynamicVariable.scala:59),
//         scala.Console$.withErr(Console.scala:193),
//         mdoc.internal.document.DocumentBuilder$$doc$.$anonfun$build$1(DocumentBuilder.scala:89),
//         scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18),
//         scala.util.DynamicVariable.withValue(DynamicVariable.scala:59),
//         scala.Console$.withOut(Console.scala:164),
//         mdoc.internal.document.DocumentBuilder$$doc$.build(DocumentBuilder.scala:88),
//         mdoc.internal.markdown.MarkdownBuilder$.$anonfun$buildDocument$2(MarkdownBuilder.scala:47),
//         mdoc.internal.markdown.MarkdownBuilder$$anon$1.run(MarkdownBuilder.scala:104)
//       ),
//       cause = None
//     )
//   )
// )
val success = serializeToArray[Try[Int]](Success(100))
1000100

Collections

There is a generic iterableCodec that can be used to define implicit collection codecs based on the Scala 2.13 collection API. For example this is how the vectorCodec is defined:

implicit def vectorCodec[A: BinaryCodec]: BinaryCodec[Vector[A]] = iterableCodec[A, Vector[A]]

All these collection codecs have one of the two possible representation. If the size is known in advance then it is the number of elements followed by all the items in iteration order, otherwise it is a flat list of all the elements wrapped in Option[T]. Vector and List are good examples for the two:

val vec = serializeToArray(Vector(1, 2, 3, 4))
80001000200030004
val lst = serializeToArray(List(1, 2, 3, 4))
1100011000210003100040

Other supported collection types in the codecs package:

import zio.NonEmptyChunk
import zio.prelude.NonEmptyList
import zio.prelude.ZSet
val arr = serializeToArray(Array(1, 2, 3, 4))
80001000200030004
val set = serializeToArray(Set(1, 2, 3, 4))
80001000200030004
val sortedSet = serializeToArray(SortedSet(1, 2, 3, 4))
1100011000210003100040
val nec = serializeToArray(NonEmptyChunk(1, 2, 3, 4))
80001000200030004
val nel = serializeToArray(NonEmptyList(1, 2, 3, 4))
1100011000210003100040
val nes = serializeToArray(ZSet(1, 2, 3, 4))
8000010001000020001000030001000040001

String deduplication

For strings the library have a simple deduplication system, without sacrificing any extra bytes for cases when strings are not duplicate. In general, the strings are encoded by a variable length int representing the length of the string in bytes, followed by its UTF-8 encoding. When deduplication is enabled, each serialized string gets an ID and if it is serialized once more in the same stream, a negative number in place of the length identifies it.

val twoStrings1 = serializeToArray(List("Hello", "Hello"))
111072101108108111110721011081081110
val twoStrings2 = serializeToArray(List(DeduplicatedString("Hello"), DeduplicatedString("Hello")))
111072101108108111110

It is not turned on by default because it breaks backward compatibility when evolving data structures. If a new string field is added, old versions of the application will skip it and would not assign the same ID to the string if it is first seen.

It is enabled internally in desert for some cases, and can be used in custom serializers freely.

Tuples

The elements of tuples are serialized flat and the whole tuple gets prefixed by 0, which makes them compatible with simple case classes:

val tup = serializeToArray((1, 2, 3)) 
0000100020003

Maps

Map, SortedMap and NonEmptyMap are just another iterableCodec built on top of the tuple support for serializing an iteration of key-value pairs:

import scala.collection.immutable.SortedMap
val map = serializeToArray(Map(1 -> "x", 2 -> "y"))
4000012120000022121
val sortedmap = serializeToArray(SortedMap(1 -> "x", 2 -> "y"))
4000012120000022121

Generic codecs for ADTs

There is a generic derivable codec for algebraic data types, with support for evolving the type during the lifecycle of the application.

For case classes the representation is the same as for tuples:

case class Point(x: Int, y: Int, z: Int)
object Point {
  implicit val codec: BinaryCodec[Point] = DerivedBinaryCodec.derive
}
val pt = serializeToArray(Point(1, 2, 3))
0000100020003

Note that there is no @evolutionSteps annotation used for the type. In this case the only additional storage cost is a single 0 byte on the beginning just like with tuples. The evolution steps are explained on a separate section.

For sum types the codec is not automatically derived for all the constructors when using the Shapeless based derivation. This has mostly historical reasons, as previous versions required passing the evolution steps as parameters to the derive method. The new ZIO Schema based derivation does not have this limitation.

Other than that it works the same way, with derive:

sealed trait Drink
case class Beer(typ: String) extends Drink
case object Water extends Drink

object Drink {
  implicit val beerCodec: BinaryCodec[Beer] = DerivedBinaryCodec.derive
  implicit val waterCodec: BinaryCodec[Water.type] = DerivedBinaryCodec.derive
  implicit val codec: BinaryCodec[Drink] = DerivedBinaryCodec.derive
}
val a = serializeToArray[Drink](Beer("X"))
000288
val b = serializeToArray[Drink](Water)
010

Transient fields in generic codecs

It is possible to mark some fields of a case class as transient:

case class Point2(x: Int, y: Int, z: Int, @transientField(None) cachedDistance: Option[Double])
object Point2 {
  implicit val codec: BinaryCodec[Point2] = DerivedBinaryCodec.derive
}
val serializedPt2 = serializeToArray(Point2(1, 2, 3, Some(3.7416)))
0000100020003
val pt2 = for {
  data <- serializedPt2
  result <- deserializeFromArray[Point2](data)
} yield result
// pt2: Either[DesertFailure, Point2] = Right(
//   value = Point2(x = 1, y = 2, z = 3, cachedDistance = None)
// )

Transient fields are not being serialized and they get a default value contained by the annotation during deserialization. Note that the default value is not type checked during compilation, if it does not match the field type it causes runtime error.

Transient constructors in generic codecs

It is possible to mark whole constructors as transient:

sealed trait Cases
@transientConstructor case class Case1() extends Cases
case class Case2() extends Cases

object Cases {
  implicit val case2Codec: BinaryCodec[Case2] = DerivedBinaryCodec.derive
  implicit val codec: BinaryCodec[Cases] = DerivedBinaryCodec.derive
}
val cs1 = serializeToArray[Cases](Case1())
Left(SerializingTransientConstructor(Case1))
val cs2 = serializeToArray[Cases](Case2())
000

Transient constructors cannot be serialized. A common use case is for remote accessible actors where some actor messages are known to be local only. By marking them as transient they can hold non-serializable data without breaking the serialization of the other, remote messages.

Generic codecs for value type wrappers

It is a good practice to use zero-cost value type wrappers around primitive types to represent the intention in the type system. desert can derive binary codecs for these too:

case class DocumentId(id: Long) // extends AnyVal // extends AnyVal
object DocumentId {
  implicit val codec: BinaryCodec[DocumentId] = DerivedBinaryCodec.deriveForWrapper
}
val id = serializeToArray(DocumentId(100))
0000000100

Custom codecs

The serialization is a simple scala function using an implicit serialization context:

def serialize(value: T)(implicit context: SerializationContext): Unit

while the deserialization is

def deserialize()(implicit ctx: DeserializationContext): T

The io.github.vigoo.desert.custom package contains a set of serialization and deserialization functions, all requiring the implicit contexts, that can be uesd to implement custom codecs.

By implementing the BinaryCodec trait it is possible to define a fully custom codec. In the following example we define a data type capable of representing cyclic graphs via a mutable next field, and a custom codec for deserializing it. It also shows that built-in support for tracking object references which is not used by the generic codecs but can be used in scenarios like this.

import cats.instances.either._
import io.github.vigoo.desert.custom._

  final class Node(val label: String,
                   var next: Option[Node]) {
    override def toString: String = 
      next match {
       case Some(n) => s"<$label -> ${n.label}>"
       case None => s"<$label>"
      }
  }
  object Node {
    implicit lazy val codec: BinaryCodec[Node] =
      new BinaryCodec[Node] {
        override def serialize(value: Node)(implicit context: SerializationContext): Unit = {
          write(value.label) // write the label using the built-in string codec
          value.next match {
            case Some(next) =>
              write(true)  // next is defined (built-in boolean codec)
              storeRefOrObject(next) // store ref-id or serialize next
            case None       =>
              write(false) // next is undefined (built-in boolean codec)
          }
        }
        
        override def deserialize()(implicit ctx: DeserializationContext): Node = {
          val label   = read[String]()        // read the label using the built-in string codec
          val result  = new Node(label, None) // create the new node
          storeReadRef(result)                // store the node in the reference map
          val hasNext = read[Boolean]()       // read if 'next' is defined
          if (hasNext) {
            // Read next with reference-id support and mutate the result
            val next = readRefOrValue[Node](storeReadReference = false)
            result.next = Some(next)
          }
          result
        }
    }     
  }

  case class Root(node: Node)
  object Root {
    implicit val codec: BinaryCodec[Root] = new BinaryCodec[Root] {
      override def deserialize()(implicit ctx: DeserializationContext): Root =
        Root(readRefOrValue[Node](storeReadReference = false))

      override def serialize(value: Root)(implicit context: SerializationContext): Unit =
        storeRefOrObject(value.node)
    }
  }

val nodeA = new Node("a", None)
// nodeA: Node = <a -> a>
val nodeB = new Node("a", None)
// nodeB: Node = <a -> a>
val nodeC = new Node("a", None)
// nodeC: Node = <a -> a>
nodeA.next = Some(nodeB)
nodeB.next = Some(nodeC)
nodeC.next = Some(nodeA)
val result = serializeToArray(Root(nodeA))
0297102971029711

Monadic custom codecs

Previous versions of desert exposed a monadic serializer/deserializer API based on ZPure with the following types:

type Ser[T] = ZPure[Nothing, SerializerState, SerializerState, SerializationEnv, DesertFailure, T]
type Deser[T] = ZPure[Nothing, SerializerState, SerializerState, DeserializationEnv, DesertFailure, T]

For compatibility, the library still defines the monadic version of the serialization functions in the io.github.vigoo.desert.custom.pure package.

A monadic serializer or deserializer can be converted to a BinarySerializer or BinaryDeserializer using the fromPure method.

To achieve higher performance, it is recommended to implement custom codecs using the low level serialization API.